Initial commit

2020-11-19 12:18:53 +01:00
commit d7e2349555
@@ -0,0 +1,139 @@
+# Cuneiform-Sign-Detection-Code
+
+Author: Tobias Dencker - <tobias.dencker@gmail.com>
+
+This is the code repository for the article submission on "Deep learning of cuneiform sign detection with weak supervision using transliteration alignment".
+
+This repository contains code to execute the proposed iterative training procedure as well as code to evaluate and visualize results.
+Moreover, we provide pre-trained models of the cuneiform sign detector for Neo-Assyrian script after iterative training on the [Cuneiform Sign Detection Dataset](https://compvis.github.io/cuneiform-sign-detection-dataset/).
+Finally, we provide a web application for the analysis of tablet images with the help of a pre-trained cuneiform sign detector.
+
+<img src="http://cunei.iwr.uni-heidelberg.de/cuneiformbrowser/functions/images_decent.jpg" alt="sign detections on tablet images: yellow box indicate TP and blue FP detections" width="700"/>
+<!--- <img src="http://cunei.iwr.uni-heidelberg.de/cuneiformbrowser/functions/images_difficult.jpg" alt="Web interface detection" width="500"/> -->
+
+## Repository description
+
+- General structure:
+    - `data`: tablet images, annotations, transliterations, metadata
+    - `experiments`: training, testing, evaluation and visualization
+    - `lib`: project library code
+    - `results`: generated detections (placed, raw and aligned), network weights, logs
+    - `scripts`: scripts to run the alignment and placement step of iterative training
+
+
+### Use cases
+
+- Pre-processing of training data
+    - line detection
+- Iterative training
+    - generate sign annotations (aligned and placed detections)
+    - sign detector training   
+- Evaluation (on test set)
+    - raw detections
+    - placed detections
+    - aligned detections
+- Test & visualize
+    - line segmentation and post-processing
+    - line-level and sign-level alignments
+    - TP/FP for raw, aligned and placed detections (full tablet and crop level)
+
+
+### Pre-processing
+As pre-processing of the training data line detections are obtained for all tablet images before iterative training.
+- use jupyter notebooks (`experiments/line_segmentation/`) for train, eval of line segmentation network and to perform line detection on all tablet images of train set
+
+
+### Training
+*Iterative training* alternates between generating aligned and placed detections and training a new sign detector:
+1. use command-line scripts (`scripts/generate/`) for running alignment and placement step of iterative training
+2. use jupyter notebooks (`experiments/sign_detector/`) for sign detector training step of iterative training
+
+To keep track of the sign detector and generated sign annotations of each iteration of iterative training (stored in `results/`),
+we follow the convention to label the sign detector with a *model version* (e.g. v002)
+which is also used to label the raw, aligned and placed detections based on this detector.
+Besides providing a model version, a user also selects which subsets of the training data to use for the generation of new annotations.
+In particular, *subsets of SAAo collections* (e.g. saa01, saa05, saa08) are selected, when running the scripts under `scripts/generate/`.
+To enable the evaluation on the test set, it is necessary to include the collections (test, saa06).
+
+
+### Evaluation
+Use the [*test sign detector notebook*](./experiments/sign_detector/test_sign_detector.ipynb) in order to test the performance of the trained sign detector (mAP) on the test set or other subsets of the dataset.
+In `experiments/alignment_evaluation/` you find further notebooks for evaluation and visualization of line-level and sign-level alignments and TP/FP for raw, aligned and placed detections (full tablet and crop level).
+
+
+### Pre-trained models
+
+We provide pre-trained models in the form of [PyTorch model files](https://pytorch.org/tutorials/beginner/saving_loading_models.html) for the line segmentation network as well as the sign detector.
+
+| Model name     | Model type        | Train annotations  |
+|----------------|-------------------|------------------------|
+| [lineNet_basic_vpub.pth](http://cunei.iwr.uni-heidelberg.de/cuneiformbrowser/model_weights/lineNet_basic_vpub.pth) | line segmentation | 410 lines  |
+
+For the sign detector, we provide the best weakly supervised model (fpn_net_vA) and the best semi-supervised model (fpn_net_vF).
+
+| Model name     | Model type        | Weak supervision in training  | Annotations in training  |  mAP on test_full  |
+|----------------|-------------------|-------------------|------------------------|------------------------|
+| [fpn_net_vA.pth](http://cunei.iwr.uni-heidelberg.de/cuneiformbrowser/model_weights/fpn_net_vA.pth) | sign detector | saa01, saa05, saa08, saa10, saa13, saa16 | None  | 45.3  |
+| [fpn_net_vF.pth](http://cunei.iwr.uni-heidelberg.de/cuneiformbrowser/model_weights/fpn_net_vF.pth) | sign detector | saa01, saa05, saa08, saa10, saa13, saa16 | train_full (4663 bboxes)  | 65.6  |
+
+
+
+
+### Web application
+
+We also provide a demo web application that enables a user to apply a trained cuneiform sign detector to a large collection of tablet images.
+The code of the web front-end is available in the [webapp repo](https://github.com/compvis/cuneiform-sign-detection-webapp/).
+The back-end code is part of this repository and is located in [lib/webapp/](./lib/webapp/).
+Below you find a short animation of how the sign detector is used with this web interface.
+
+<img src="http://cunei.iwr.uni-heidelberg.de/cuneiformbrowser/functions/demo_cuneiform_sign_detection.gif" alt="Web interface detection" width="700"/>
+
+
+For demonstration purposes, we also host an instance of the web application: [Demo Web Application](http://cunei.iwr.uni-heidelberg.de/cuneiformbrowser/).
+If you would like to test the web application, please contact us for user credentials to log in.
+Please note that this web application is a prototype for demonstration purposes only and not a production system.
+In case the website is not reachable, or other technical issues occur, please contact us.
+
+
+
+### Cuneiform font
+
+For visualization of the cuneiform characters, we recommend installing the [Unicode Cuneiform Fonts](https://www.hethport.uni-wuerzburg.de/cuneifont/) by Sylvie Vanseveren.
+
+
+## Installation
+
+#### Software
+Install general dependencies:
+
+- **OpenGM** with python wrapper - library for discrete graphical models. http://hciweb2.iwr.uni-heidelberg.de/opengm/  
+This library is needed for the alignment step during training. Testing is not affected. An installation guide for Ubuntu 14.04 can be found [here](./install_opengm.md).
+
+- Python 2.7.X
+
+- Python packages:
+    - torch 1.0
+    - torchvision
+    - scikit-image 0.14.0
+    - pandas, scipy, sklearn, jupyter
+    - pillow, tqdm, tensorboardX, nltk, Levensthein, editdistance, easydict
+
+
+Clone this repository and place the [*cuneiform-sign-detection-dataset*](https://github.com/compvis/cuneiform-sign-detection-dataset) in the [./data sub-folder](./data/).
+
+#### Hardware
+
+Training and evaluation can be performed on a machine with a single GPU (we used a GeFore GTX 1080).
+The demo web application can run on a web server without GPU support,
+since detection inference with a lightweight MobileNetV2 backbone is fast even in CPU only mode
+(less than 1s for an image with HD resolution, less than 10s for 4K resolution).
+
+### References
+This repository also includes external code. In particular, we want to mention:
+> - kuangliu's *torchcv* and *pytorch-cifar* repositories from which we adapted the SSD and FPN detector code:
+ https://github.com/kuangliu/pytorch-cifar and
+ https://github.com/kuangliu/torchcv
+> - Ross Girshick's *py-faster-rcnn* repository from which we adapted part of our evaluation routine:
+ https://github.com/rbgirshick/py-faster-rcnn
+> - Rico Sennrich's *Bleualign* repository from which we adapted part of the Bleualign implementation:
+ https://github.com/rsennrich/Bleualign
@@ -0,0 +1 @@
+theme: jekyll-theme-cayman
@@ -0,0 +1,15 @@
+### Data folder
+
+Place [*cuneiform-sign-detection-dataset*](https://github.com/to3i/cuneiform-sign-detection-dataset)  folders here:
+- ./data/annotations
+- ./data/images
+- ./data/segments
+- ./data/transliterations
+
+#### Meta data files:
+
+- *cunei_mzl.csv* contains the sign code class index established by Borger's Mesopotamisches Zeichenlexikon (MZL)
+- *newLabels.json* contains new labels (re-indexing) for the subset of Neo-Assyrian MZL code classes so that labels range from 0-360 instead of 0-910 which reduces the output dimension of the detector
+- *unicode_sign_stats.csv* contains estimates for sign length and height for individual cuneiform sign classes. These estimates were derived from the [Unicode Cuneiform Fonts](https://www.hethport.uni-wuerzburg.de/cuneifont/) by Sylvie Vanseveren. 
+
+
@@ -0,0 +1 @@
+[0, 2, 0, 191, 0, 184, 196, 238, 239, 40, 26, 221, 240, 241, 24, 109, 73, 236, 210, 0, 205, 0, 242, 0, 58, 0, 133, 0, 0, 243, 0, 199, 244, 0, 0, 0, 0, 245, 0, 0, 0, 0, 0, 0, 246, 0, 0, 0, 0, 247, 0, 0, 0, 0, 0, 0, 0, 248, 0, 0, 0, 249, 0, 0, 250, 208, 0, 0, 0, 0, 0, 193, 0, 251, 0, 252, 0, 0, 0, 162, 154, 0, 0, 0, 66, 23, 48, 0, 0, 108, 228, 129, 140, 0, 0, 0, 0, 253, 41, 97, 0, 254, 0, 0, 0, 255, 256, 0, 257, 258, 4, 8, 17, 31, 0, 202, 0, 224, 83, 213, 49, 259, 260, 0, 0, 0, 0, 20, 0, 175, 207, 261, 90, 0, 6, 0, 156, 93, 189, 152, 110, 7, 76, 64, 0, 0, 0, 0, 145, 262, 263, 194, 264, 265, 0, 0, 0, 266, 0, 0, 267, 0, 29, 0, 78, 268, 142, 231, 269, 0, 63, 0, 89, 270, 271, 272, 34, 273, 186, 0, 101, 107, 274, 275, 104, 0, 0, 0, 276, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 277, 0, 278, 0, 0, 0, 69, 0, 279, 0, 0, 280, 0, 0, 281, 0, 0, 0, 0, 0, 176, 215, 116, 0, 0, 0, 0, 0, 0, 282, 0, 283, 0, 0, 0, 284, 0, 157, 0, 0, 0, 61, 0, 0, 0, 232, 206, 5, 0, 0, 0, 80, 124, 222, 36, 0, 0, 183, 195, 84, 160, 237, 0, 0, 0, 119, 0, 0, 0, 285, 71, 0, 0, 0, 229, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 182, 0, 0, 132, 286, 0, 0, 287, 115, 52, 0, 197, 233, 209, 0, 0, 0, 0, 0, 0, 200, 0, 204, 288, 46, 0, 0, 289, 0, 0, 0, 0, 0, 0, 0, 290, 0, 50, 0, 0, 0, 0, 0, 291, 0, 0, 0, 292, 0, 293, 192, 294, 295, 0, 0, 0, 0, 0, 0, 296, 0, 25, 120, 297, 212, 123, 0, 146, 134, 21, 0, 0, 0, 70, 0, 0, 0, 0, 0, 0, 0, 0, 0, 298, 299, 0, 0, 0, 300, 141, 44, 62, 45, 0, 0, 0, 0, 0, 138, 0, 0, 0, 0, 301, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 302, 0, 0, 118, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 303, 0, 0, 0, 304, 305, 0, 0, 306, 0, 165, 307, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 308, 0, 0, 0, 113, 309, 137, 0, 0, 28, 0, 0, 310, 0, 159, 0, 0, 0, 0, 0, 0, 0, 0, 181, 99, 158, 311, 0, 0, 0, 30, 102, 0, 74, 177, 3, 126, 312, 19, 67, 188, 130, 128, 313, 178, 0, 0, 163, 0, 314, 0, 42, 315, 0, 15, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 143, 0, 0, 0, 0, 225, 72, 0, 139, 223, 0, 0, 316, 57, 317, 318, 319, 13, 35, 320, 321, 217, 322, 179, 190, 121, 65, 150, 0, 323, 324, 148, 96, 0, 0, 226, 325, 0, 326, 327, 0, 219, 328, 98, 43, 87, 0, 0, 234, 329, 112, 0, 60, 0, 18, 198, 136, 330, 0, 0, 331, 39, 0, 155, 27, 0, 92, 0, 0, 0, 0, 0, 0, 332, 0, 0, 333, 105, 0, 334, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 218, 0, 0, 94, 135, 0, 0, 103, 174, 0, 0, 0, 75, 32, 0, 201, 187, 0, 335, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 16, 0, 211, 0, 0, 0, 0, 0, 0, 0, 0, 122, 0, 0, 0, 0, 0, 167, 0, 0, 22, 168, 0, 0, 230, 12, 0, 0, 336, 85, 166, 0, 227, 0, 53, 337, 0, 37, 0, 0, 185, 0, 338, 339, 171, 0, 0, 173, 0, 0, 86, 340, 0, 153, 0, 0, 0, 0, 341, 216, 342, 343, 0, 79, 180, 144, 0, 0, 125, 0, 161, 169, 9, 0, 0, 77, 100, 0, 0, 0, 344, 0, 0, 214, 131, 55, 95, 14, 0, 47, 345, 0, 81, 56, 117, 106, 0, 0, 0, 235, 33, 0, 0, 0, 0, 346, 347, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 348, 0, 349, 0, 0, 0, 0, 0, 0, 350, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 114, 38, 351, 352, 0, 82, 353, 0, 164, 354, 0, 355, 0, 0, 172, 0, 0, 356, 51, 357, 358, 88, 0, 359, 0, 360, 127, 54, 361, 147, 0, 0, 11, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 68, 362, 0, 363, 0, 111, 0, 0, 1, 0, 220, 364, 365, 366, 0, 0, 0, 367, 10, 0, 0, 0, 0, 0, 0, 0, 368, 0, 0, 0, 369, 0, 170, 203, 0, 0, 151, 0, 91, 0, 370, 371, 372, 0, 0, 0, 0, 0, 149, 59, 0, 0, 0, 0, 373, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 374, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
@@ -0,0 +1,239 @@
+train_lbl,width,height
+1,0.71875,0.9375
+2,0.9296875,0.9375
+3,1.71875,0.9375
+4,1.546875,0.9375
+5,1.6640625,1.0234375
+6,1.8125,0.9375
+7,1.6484375,0.9375
+8,1.2109375,1.0703125
+9,2.7421875,1.0625
+10,0.4765625,1.0703125
+11,0.7109375,0.9375
+12,1.71875,1.0
+13,1.171875,0.9375
+14,0.453125,0.9375
+15,1.3125,0.9375
+16,0.4375,0.9375
+17,0.9375,1.015625
+18,0.984375,0.9375
+19,1.3046875,0.9375
+20,1.890625,1.078125
+21,1.2421875,0.984375
+22,1.296875,0.9375
+23,2.109375,1.078125
+24,1.4296875,1.0546875
+25,1.7890625,1.0078125
+26,1.171875,0.9375
+27,1.203125,0.9375
+28,1.171875,0.9375
+29,2.96875,0.96875
+30,1.84375,0.9375
+31,1.21875,0.9375
+32,1.5078125,0.9453125
+33,1.5859375,1.0390625
+34,1.5234375,0.9375
+35,1.46875,0.9375
+36,1.2578125,0.9375
+37,1.7109375,0.9375
+38,1.4453125,1.0703125
+39,0.7421875,0.9375
+40,1.171875,0.9375
+41,1.2421875,0.9375
+42,1.7265625,0.9921875
+43,0.65625,0.9375
+44,0.9453125,0.9375
+45,1.2578125,1.015625
+46,2.3046875,0.9375
+47,1.3515625,0.9375
+48,1.6015625,1.0
+49,0.9296875,0.9375
+50,2.53125,1.0625
+51,0.703125,0.9375
+52,1.5234375,0.984375
+53,1.4921875,0.9453125
+54,0.8828125,0.9375
+55,1.0,0.9375
+56,2.0703125,0.9375
+57,0.921875,0.9375
+58,2.0859375,1.0703125
+59,1.3828125,1.0
+60,2.09375,0.9375
+61,2.15625,0.9453125
+62,0.9375,0.9375
+63,1.625,0.9453125
+64,1.59375,0.9453125
+65,1.6328125,0.9375
+66,2.53125,0.984375
+67,1.5,0.9453125
+68,0.671875,0.9375
+69,2.1171875,1.0546875
+70,2.5546875,0.9921875
+71,2.3125,1.078125
+72,2.25,1.0703125
+73,1.6015625,1.0234375
+74,3.1328125,1.0546875
+75,1.296875,0.9375
+76,1.7265625,0.9375
+77,1.4453125,0.9375
+78,1.1640625,1.078125
+79,1.3671875,0.9375
+80,1.2578125,0.9375
+81,1.1484375,0.9375
+82,1.515625,1.0546875
+83,1.515625,0.9375
+84,1.7421875,0.9375
+85,1.6953125,0.953125
+86,0.9453125,0.9375
+87,1.6015625,0.9375
+88,1.390625,1.0625
+89,1.1875,0.9453125
+90,1.4140625,0.9453125
+91,1.9921875,1.0546875
+92,2.40625,0.9375
+93,2.0625,0.9453125
+94,0.671875,0.9375
+95,1.234375,0.9375
+96,1.1953125,1.046875
+97,1.171875,0.9453125
+98,0.671875,0.9375
+99,1.5078125,0.9375
+100,1.46875,1.078125
+101,1.4140625,1.0703125
+102,1.6328125,0.9375
+103,1.59375,0.9375
+104,2.1953125,0.9375
+105,0.71875,0.9375
+106,1.6328125,1.0390625
+107,1.4140625,1.0546875
+108,1.4921875,0.9375
+109,1.6171875,1.03125
+110,1.59375,0.9375
+111,0.765625,0.9453125
+112,1.875,0.9375
+113,0.9296875,0.9375
+114,1.6015625,1.0703125
+115,2.0,1.0
+116,1.2734375,0.9375
+117,1.5859375,1.03125
+118,1.78125,1.0234375
+119,2.109375,1.03125
+120,1.984375,1.0546875
+121,1.546875,0.9375
+122,1.3046875,0.9375
+123,1.765625,1.078125
+124,1.265625,0.9375
+125,2.0703125,0.9375
+126,1.5390625,0.9375
+127,2.3515625,1.078125
+128,1.6171875,0.9375
+129,1.75,1.0625
+130,0.8515625,0.9375
+131,0.890625,0.9453125
+132,1.84375,1.0546875
+133,2.2890625,1.0859375
+134,1.5703125,0.96875
+135,0.671875,0.9375
+136,0.75,0.9375
+137,2.53125,1.078125
+138,1.1875,0.9375
+139,1.2890625,0.9375
+140,0.9296875,0.9375
+141,0.8515625,0.9375
+142,2.046875,1.0703125
+143,1.625,0.9375
+144,3.0546875,0.9453125
+145,1.4296875,0.9453125
+146,2.359375,1.03125
+147,0.8515625,0.9375
+148,1.640625,0.9375
+149,1.859375,1.0078125
+150,1.1328125,0.9375
+151,2.609375,1.078125
+152,1.7890625,0.9375
+153,0.984375,0.9375
+154,1.953125,0.9765625
+155,1.46875,0.9375
+156,1.65625,0.9375
+157,1.96875,0.9375
+158,1.5078125,1.078125
+159,1.8828125,1.046875
+160,2.0625,1.0078125
+161,2.671875,1.0625
+162,2.296875,0.984375
+163,1.7109375,0.9375
+164,1.5234375,1.09375
+165,0.9296875,0.9609375
+166,2.2734375,1.03125
+167,1.25,0.9375
+168,2.1640625,0.9375
+169,2.84375,1.078125
+170,1.1796875,1.0
+171,2.484375,0.9375
+172,3.9609375,1.0078125
+173,0.8515625,0.9375
+174,1.796875,0.9375
+175,2.2578125,1.0546875
+176,1.046875,0.9375
+177,1.671875,0.9375
+178,1.828125,1.0
+179,1.4765625,0.9375
+180,2.5546875,1.0703125
+181,1.5078125,0.9453125
+182,3.0078125,1.078125
+183,1.4921875,0.9375
+184,1.3359375,0.9375
+185,1.2890625,0.9375
+186,2.578125,0.9375
+187,1.59375,0.9453125
+188,1.5234375,0.9375
+189,3.828125,0.9375
+190,1.28125,0.9375
+191,1.125,0.9375
+192,2.0625,0.9375
+193,1.640625,0.96875
+194,1.0234375,0.9375
+195,1.7421875,0.9375
+196,1.5859375,0.9375
+197,1.84375,0.9375
+198,1.6875,0.9375
+199,2.171875,1.0703125
+200,1.3359375,0.9375
+201,1.953125,0.9375
+202,1.7578125,1.0
+203,1.7734375,1.078125
+204,2.203125,0.9375
+205,1.515625,1.046875
+206,2.234375,0.9375
+207,1.34375,0.9375
+208,2.2109375,1.0625
+209,1.3671875,0.9375
+210,1.4609375,1.0234375
+211,2.5078125,1.0703125
+212,1.765625,1.0703125
+213,0.9296875,1.0625
+214,1.859375,0.9375
+215,2.234375,1.015625
+216,1.8671875,1.078125
+217,1.7890625,1.0703125
+218,0.59375,0.9375
+219,0.53125,0.9375
+220,0.8046875,0.9375
+221,1.9453125,0.9375
+223,2.328125,1.03125
+224,1.5859375,0.9375
+225,1.3046875,0.9375
+226,1.7265625,1.046875
+227,1.84375,0.9375
+228,1.5234375,0.9375
+229,2.6796875,1.046875
+230,1.53125,0.984375
+231,1.8046875,0.953125
+232,1.25,0.9375
+233,1.5859375,0.9375
+234,2.0625,0.9453125
+235,1.5859375,1.03125
+236,1.84375,1.0234375
+237,2.171875,1.0703125
+238,1.5859375,0.9375
+239,1.6875,0.9375
@@ -0,0 +1,6 @@
+### Train & eval line segmentation network
+
+- use `train_line_segmentation.ipynb` for training and `test_line_segmentation.ipynb` for eval
+
+### Pre-processing before iterative training
+- use `precompute_line_segmentations.ipynb` obtain line detections for all tablet images in the training set as pre-processing before iterative training starts
@@ -0,0 +1,301 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Pre-compute and store line segmentations"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "import os\n",
+    "import numpy as np\n",
+    "import pandas as pd\n",
+    "from PIL import Image\n",
+    "from ast import literal_eval"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "# torch\n",
+    "import torch\n",
+    "import torchvision\n",
+    "# addons\n",
+    "from tqdm import tqdm"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "#%pylab inline\n",
+    "%matplotlib inline\n",
+    "import matplotlib.pyplot as plt"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "# for auto-reloading external modules\n",
+    "# see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython\n",
+    "%load_ext autoreload\n",
+    "%autoreload 2"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "relative_path = '../../'\n",
+    "# ensure that parent path is on the python path in order to have all packages available\n",
+    "import sys, os\n",
+    "parent_path = os.path.join(os.getcwd(), relative_path)\n",
+    "parent_path = os.path.realpath(parent_path)  # os.path.abspath(...)\n",
+    "sys.path.insert(0, parent_path)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "from lib.models.trained_model_loader import get_line_net_fcn\n",
+    "from lib.datasets.cunei_dataset_segments import CuneiformSegments, get_segment_meta\n",
+    "from lib.transliteration.sign_labels import get_label_list\n",
+    "from lib.utils.transform_utils import UnNormalize\n",
+    "\n",
+    "from lib.detection.run_gen_line_detection import gen_line_detections"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Config Basics"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "# toggle generation\n",
+    "save_line_detections = True"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "# set line segmentation network\n",
+    "line_model_version = 'v002'"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "# dataset config\n",
+    "collections = ['test', 'train', 'saa01',  'saa05', 'saa06', 'saa08', 'saa09', 'saa10', 'saa13', 'saa16'] \n",
+    "#collections = ['saa01', 'saa05']"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "data_layer_params = dict(batch_size=[128, 16],\n",
+    "                         img_channels=1,\n",
+    "                         gray_mean=[0.5],\n",
+    "                         gray_std=[1.0], \n",
+    "                         num_classes = 2\n",
+    "                         )"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Config Data Augmentation"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "num_classes = data_layer_params['num_classes']\n",
+    "num_c = data_layer_params['img_channels']\n",
+    "gray_mean = data_layer_params['gray_mean']\n",
+    "gray_std = data_layer_params['gray_std']"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "re_transform = torchvision.transforms.Compose([\n",
+    "    UnNormalize(mean=gray_mean, std=gray_std),\n",
+    "    torchvision.transforms.ToPILImage(),\n",
+    "                                              ])\n",
+    "re_transform_rgb = torchvision.transforms.Compose([\n",
+    "    UnNormalize(mean=gray_mean * 3, std=gray_std * 3),\n",
+    "    torchvision.transforms.ToPILImage(),\n",
+    "                                              ])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Load Model"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "#use_gpu = torch.cuda.is_available()\n",
+    "device = torch.device(\"cuda:0\" if torch.cuda.is_available() else \"cpu\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "model_fcn = get_line_net_fcn(line_model_version, device, num_classes=num_classes, num_c=num_c)\n",
+    "print(model_fcn)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Run experiment"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true,
+    "scrolled": false
+   },
+   "outputs": [],
+   "source": [
+    "for saa_version in collections:\n",
+    "    print('collection: <><>{}<><>'.format(saa_version))\n",
+    "    \n",
+    "    ### Get collection dataset\n",
+    "    dataset = CuneiformSegments(collections=[saa_version], relative_path=relative_path, \n",
+    "                                only_annotated=False, only_assigned=True, preload_segments=False)\n",
+    "    \n",
+    "    # filter collection dataset - OPTIONAL\n",
+    "    didx_list = range(len(dataset))\n",
+    "    \n",
+    "    ### Generate line detections\n",
+    "    gen_line_detections(didx_list, dataset, saa_version, relative_path,\n",
+    "                        line_model_version, model_fcn, re_transform, device,\n",
+    "                        save_line_detections)     "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": []
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 2",
+   "language": "python",
+   "name": "python2"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 2
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython2",
+   "version": "2.7.6"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
@@ -0,0 +1,11 @@
+### Perform sign detector training
+After the sign annotations (aligned and placed detections) have been generated and stored under `results/results_ssd/` using the scripts in `scripts/generate/`,
+the sign detector is trained by performing the following steps:
+
+1) use `train_sign_classifier.ipynb` as template to train sign classifier 
+2) use `train_sign_detector.ipynb` as template to train sign detector (initialized with pre-trained sign classifier from 1.)
+3) in semi-supervised case, use `finetune_sign_detector.ipynb` to fine-tune sign detector on manual annotations
+
+### Eval sign detector
+
+- use `test_sign_detector.ipynb` for evaluation of the sign detector
@@ -0,0 +1,623 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Fine-tune sign detector network (in semi-supervised case)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "import numpy as np\n",
+    "import pandas as pd\n",
+    "from PIL import Image\n",
+    "from ast import literal_eval\n",
+    "import os.path\n",
+    "from tqdm import tqdm\n",
+    "import copy\n",
+    "\n",
+    "import torch\n",
+    "import torch.optim as optim\n",
+    "from torch.optim import lr_scheduler\n",
+    "import torch.utils.data as data\n",
+    "\n",
+    "from torchvision import transforms as trafos\n",
+    "import torchvision.transforms as transforms"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "import matplotlib.pyplot as plt"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "# for auto-reloading external modules\n",
+    "# see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython\n",
+    "%load_ext autoreload\n",
+    "%autoreload 2"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "relative_path = '../../'\n",
+    "# ensure that parent path is on the python path in order to have all packages available\n",
+    "import sys, os\n",
+    "parent_path = os.path.join(os.getcwd(), relative_path)\n",
+    "parent_path = os.path.realpath(parent_path)  # os.path.abspath(...)\n",
+    "sys.path.insert(0, parent_path)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "from lib.datasets.cunei_dataset_ssd import CuneiformSSD\n",
+    "\n",
+    "from lib.alignment.LineFragment import plot_boxes\n",
+    "from lib.utils.pytorch_utils import get_tensorboard_writer"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "from lib.models.mobilenetv2_mod03 import MobileNetV2\n",
+    "from lib.models.mobilenetv2_fpn import MobileNetV2FPN\n",
+    "from lib.models.trained_model_loader import get_fpn_ssd_net\n",
+    "from lib.utils.torchcv.models.net import FPNSSD\n",
+    "from lib.utils.torchcv.loss.ssd_loss import SSDLoss"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "import time\n",
+    "hh = 0.001\n",
+    "## time.sleep(60*60*hh)\n",
+    "for i in tqdm(range(int(6*60*hh))):\n",
+    "    time.sleep(10)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Config Basics"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "model_version = 'v001ft01'\n",
+    "\n",
+    "# config pretrained detector\n",
+    "pretrained_model_version = 'v001'  # 'v191'  \n",
+    "\n",
+    "# config datasets for training and testing\n",
+    "train_collections = ['train_D'] \n",
+    "test_collections =  ['testEXT']  # ['test_full']"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "# config generated data\n",
+    "with_gen_data = False\n",
+    "\n",
+    "gen_model_version = 'v001'  \n",
+    "\n",
+    "gen_folder = 'results_ssd/{}/'.format(gen_model_version)  \n",
+    "gen_file_path = None\n",
+    "\n",
+    "gen_collections = ['saa01', 'saa05', 'saa08', 'saa10', 'saa13', 'saa16']\n",
+    "gen_collections += ['train']"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "# config backbone architecture\n",
+    "arch_opt = 1\n",
+    "arch_type = 'mobile'\n",
+    "width_mult = 0.625\n",
+    "\n",
+    "# config detector\n",
+    "with_64 = False\n",
+    "create_bg_class = False\n",
+    "img_size = 512\n",
+    "num_classes = 240\n",
+    "\n",
+    "# config schedule\n",
+    "num_epochs = 11 \n",
+    "lr_milestones = [60]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "# set log file name\n",
+    "if with_gen_data:\n",
+    "    version_remark = '{}_fpnssd_mobilenetv2_{}_gen_{}'\n",
+    "    version_remark = version_remark.format(\"_\".join(train_collections), pretrained_model_version, gen_model_version)\n",
+    "else:\n",
+    "    version_remark = '{}_fpnssd_mobilenetv2_{}'\n",
+    "    version_remark = version_remark.format(\"_\".join(train_collections), pretrained_model_version)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Preparing Datasets"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "if with_gen_data:\n",
+    "    from lib.utils.torchcv.box_coder_retina_lm import RetinaBoxCoder\n",
+    "    from lib.utils.torchcv.transforms_lm.resize import resize_lm\n",
+    "    from lib.utils.torchcv.transforms_lm.random_crop_tile import random_crop_tile_lm\n",
+    "    from lib.utils.torchcv.transforms_lm.pad_gs import pad_lm\n",
+    "else:\n",
+    "    from lib.utils.torchcv.box_coder_retina import RetinaBoxCoder\n",
+    "    from lib.utils.torchcv.transforms.resize import resize\n",
+    "    from lib.utils.torchcv.transforms.random_crop_tile import random_crop_tile\n",
+    "    from lib.utils.torchcv.transforms.pad_gs import pad"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "box_coder = RetinaBoxCoder(create_bg_class=create_bg_class)\n",
+    "print('num_anchors', len(box_coder.anchor_boxes))\n",
+    "print('anchor areas', np.sqrt(box_coder.anchor_areas))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "if with_gen_data:    \n",
+    "    def transform_train(img, boxes, labels, linemap):\n",
+    "        # img = transforms.ColorJitter(0.3,0.3,0,0)(img)\n",
+    "        img = transforms.RandomChoice([transforms.ColorJitter(0.5,0.5,0,0), \n",
+    "                                       transforms.Lambda(lambda x: x)  # identity\n",
+    "                                      ])(img)  \n",
+    "        img, linemap = pad_lm(img, linemap, (600, 600))\n",
+    "        img, boxes, labels, linemap = random_crop_tile_lm(img, boxes, labels, linemap, scale_range=[0.65, 1], max_aspect_ratio=1.35)\n",
+    "        img, boxes, linemap = resize_lm(img, boxes, linemap, size=(img_size, img_size), random_interpolation=True)\n",
+    "        img = transforms.Compose([\n",
+    "            transforms.ToTensor(),\n",
+    "            transforms.Normalize(mean=[0.5], std=[1.0])\n",
+    "        ])(img)\n",
+    "        boxes, labels = box_coder.encode(boxes, labels, linemap)\n",
+    "\n",
+    "        return img, boxes, labels, transforms.ToTensor()(linemap)\n",
+    "else:\n",
+    "    def transform_train(img, boxes, labels):\n",
+    "        # img = transforms.ColorJitter(0.3,0.3,0,0)(img)\n",
+    "        img = transforms.RandomChoice([transforms.ColorJitter(0.5,0.5,0,0), \n",
+    "                                       transforms.Lambda(lambda x: x)  # identity\n",
+    "                                      ])(img)  \n",
+    "        img = pad(img, (600, 600))\n",
+    "        img, boxes, labels = random_crop_tile(img, boxes, labels, scale_range=[0.65, 1], max_aspect_ratio=1.35)\n",
+    "        img, boxes = resize(img, boxes, size=(img_size, img_size), random_interpolation=True)\n",
+    "        img = transforms.Compose([\n",
+    "            transforms.ToTensor(),\n",
+    "            transforms.Normalize(mean=[0.5], std=[1.0])\n",
+    "        ])(img)\n",
+    "        boxes, labels = box_coder.encode(boxes, labels)\n",
+    "        return img, boxes, labels"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "if with_gen_data:\n",
+    "    trainset = CuneiformSSD(collections=train_collections, transform=transform_train, \n",
+    "                            gen_file_path=gen_file_path, gen_collections=gen_collections, gen_folder=gen_folder, \n",
+    "                            relative_path=relative_path, use_balanced_idx=False, use_linemaps=True, \n",
+    "                            remove_empty_tiles=False, min_align_ratio=0.2)\n",
+    "else:\n",
+    "    trainset = CuneiformSSD(collections=train_collections, transform=transform_train,\n",
+    "                            gen_file_path=gen_file_path, relative_path=relative_path, use_linemaps=False)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "if with_gen_data:\n",
+    "    def transform_test(img, boxes, labels, linemap):\n",
+    "        img, boxes, labels, linemap = random_crop_tile_lm(img, boxes, labels, linemap, scale_range=[0.85, 0.86], max_aspect_ratio=1.001)\n",
+    "        img, boxes, linemap = resize_lm(img, boxes, linemap, size=(img_size, img_size), random_interpolation=True)\n",
+    "        img = transforms.Compose([\n",
+    "            transforms.ToTensor(),\n",
+    "            transforms.Normalize(mean=[0.5],std=[1.0])\n",
+    "        ])(img)\n",
+    "        boxes, labels = box_coder.encode(boxes, labels, linemap)\n",
+    "        return img, boxes, labels, transforms.ToTensor()(linemap)\n",
+    "else:\n",
+    "    def transform_test(img, boxes, labels):\n",
+    "        img, boxes, labels = random_crop_tile(img, boxes, labels, scale_range=[0.85, 0.86], max_aspect_ratio=1.001)\n",
+    "        img, boxes = resize(img, boxes, size=(img_size, img_size), random_interpolation=True)\n",
+    "        img = transforms.Compose([\n",
+    "            transforms.ToTensor(),\n",
+    "            transforms.Normalize(mean=[0.5],std=[1.0])\n",
+    "        ])(img)\n",
+    "        boxes, labels = box_coder.encode(boxes, labels)\n",
+    "        return img, boxes, labels"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "if with_gen_data:\n",
+    "    testset = CuneiformSSD(collections=test_collections, transform=transform_test,\n",
+    "                           gen_file_path=None, relative_path=relative_path, use_linemaps=True)\n",
+    "else:\n",
+    "    testset = CuneiformSSD(collections=test_collections, transform=transform_test,\n",
+    "                           gen_file_path=None, relative_path=relative_path, use_linemaps=False)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "trainloader = data.DataLoader(trainset, batch_size=32, shuffle=True, num_workers=3)\n",
+    "testloader = data.DataLoader(testset, batch_size=32, shuffle=False, num_workers=3)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Building Model"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "device = 'cuda' if torch.cuda.is_available() else 'cpu'"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "# load FPN model from pretrained detector model\n",
+    "fpnssd_net = get_fpn_ssd_net(pretrained_model_version, device, arch_type, with_64, arch_opt, width_mult, \n",
+    "                             relative_path, num_classes, num_c=1)\n",
+    "fpnssd_net.train()\n",
+    "\n",
+    "# print model\n",
+    "print(fpnssd_net)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "### Test net\n",
+    "loc_preds, cls_preds = fpnssd_net(torch.randn(1, 1, img_size, img_size).to(device))\n",
+    "print(loc_preds.size(), cls_preds.size())"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Optimization"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "criterion = SSDLoss(num_classes=num_classes)\n",
+    "#criterion = FocalLoss(num_classes=num_classes)\n",
+    "optimizer = optim.SGD(fpnssd_net.parameters(), lr=0.0001, momentum=0.9, weight_decay=1e-4)\n",
+    "\n",
+    "# lr policy\n",
+    "# scheduler = lr_scheduler.ExponentialLR(optimizer, gamma=0.97)\n",
+    "scheduler = lr_scheduler.MultiStepLR(optimizer, milestones=lr_milestones, gamma=0.1)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "# init logger\n",
+    "if version_remark == '':\n",
+    "    comment_str = '_{}'.format(model_version)\n",
+    "else:\n",
+    "    comment_str = '_{}_{}'.format(model_version, version_remark)\n",
+    "writer = get_tensorboard_writer(logs_folder='{}results/run_logs/detector'.format(relative_path), comment=comment_str)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "# Training\n",
+    "best_loss = float('inf')  # best test loss\n",
+    "best_epoch = 0\n",
+    "best_model_wts = copy.deepcopy(fpnssd_net.state_dict())\n",
+    "\n",
+    "\n",
+    "def train(epoch):\n",
+    "    fpnssd_net.train()\n",
+    "    train_loss = 0\n",
+    "\n",
+    "    scheduler.step()\n",
+    "\n",
+    "    if with_gen_data:\n",
+    "        for batch_idx, (inputs, loc_targets, cls_targets, linemap) in enumerate(trainloader):\n",
+    "            inputs = inputs.to(device)\n",
+    "            loc_targets = loc_targets.to(device)\n",
+    "            cls_targets = cls_targets.to(device)\n",
+    "\n",
+    "            optimizer.zero_grad()\n",
+    "            loc_preds, cls_preds = fpnssd_net(inputs)\n",
+    "            loss = criterion(loc_preds, loc_targets, cls_preds, cls_targets)\n",
+    "            loss.backward()\n",
+    "            optimizer.step()\n",
+    "\n",
+    "            train_loss += loss.item()\n",
+    "            print('train_loss: %.3f | avg_loss: %.3f [%d/%d]'\n",
+    "                  % (loss.item(), train_loss/(batch_idx+1), batch_idx+1, len(trainloader)))\n",
+    "    else:\n",
+    "        for batch_idx, (inputs, loc_targets, cls_targets) in enumerate(trainloader):\n",
+    "            inputs = inputs.to(device)\n",
+    "            loc_targets = loc_targets.to(device)\n",
+    "            cls_targets = cls_targets.to(device)\n",
+    "\n",
+    "            optimizer.zero_grad()\n",
+    "            loc_preds, cls_preds = fpnssd_net(inputs)\n",
+    "            loss = criterion(loc_preds, loc_targets, cls_preds, cls_targets)\n",
+    "            loss.backward()\n",
+    "            optimizer.step()\n",
+    "\n",
+    "            train_loss += loss.item()\n",
+    "            print('train_loss: %.3f | avg_loss: %.3f [%d/%d]'\n",
+    "                  % (loss.item(), train_loss/(batch_idx+1), batch_idx+1, len(trainloader)))\n",
+    "\n",
+    "    # write to logger\n",
+    "    phase = 'train'\n",
+    "    writer.add_scalar('data/{}/loss'.format(phase), train_loss / len(trainloader), epoch)\n",
+    "\n",
+    "def test(epoch):\n",
+    "    fpnssd_net.eval()\n",
+    "    test_loss = 0\n",
+    "    with torch.no_grad():\n",
+    "\n",
+    "        if with_gen_data:\n",
+    "            for batch_idx, (inputs, loc_targets, cls_targets, linemap) in enumerate(testloader):\n",
+    "                inputs = inputs.to(device)\n",
+    "                loc_targets = loc_targets.to(device)\n",
+    "                cls_targets = cls_targets.to(device)\n",
+    "\n",
+    "                loc_preds, cls_preds = fpnssd_net(inputs)\n",
+    "                loss = criterion(loc_preds, loc_targets, cls_preds, cls_targets)\n",
+    "                test_loss += loss.item()\n",
+    "                print('test_loss: %.3f | avg_loss: %.3f [%d/%d]'\n",
+    "                      % (loss.item(), test_loss/(batch_idx+1), batch_idx+1, len(testloader)))\n",
+    "        else:\n",
+    "            for batch_idx, (inputs, loc_targets, cls_targets) in enumerate(testloader):\n",
+    "                inputs = inputs.to(device)\n",
+    "                loc_targets = loc_targets.to(device)\n",
+    "                cls_targets = cls_targets.to(device)\n",
+    "\n",
+    "                loc_preds, cls_preds = fpnssd_net(inputs)\n",
+    "                loss = criterion(loc_preds, loc_targets, cls_preds, cls_targets)\n",
+    "                test_loss += loss.item()\n",
+    "                print('test_loss: %.3f | avg_loss: %.3f [%d/%d]'\n",
+    "                      % (loss.item(), test_loss/(batch_idx+1), batch_idx+1, len(testloader)))\n",
+    "\n",
+    "    # write to logger\n",
+    "    phase = 'test'\n",
+    "    writer.add_scalar('data/{}/loss'.format(phase), test_loss / len(testloader), epoch)\n",
+    "\n",
+    "    # deep copy the model\n",
+    "    global best_loss\n",
+    "    global best_epoch\n",
+    "    test_loss /= len(testloader)\n",
+    "    if test_loss < best_loss and epoch > 5:\n",
+    "        # best_model_wts = copy.deepcopy(fpnssd_net.state_dict())\n",
+    "        weights_path = '{}results/weights/fpn_net_{}_best.pth'.format(relative_path, model_version)\n",
+    "        torch.save(fpnssd_net.state_dict(), weights_path)\n",
+    "        best_epoch = epoch\n",
+    "        best_loss = test_loss"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true,
+    "scrolled": true
+   },
+   "outputs": [],
+   "source": [
+    "for epoch in tqdm(range(num_epochs)):\n",
+    "    print('\\nEpoch: %d' % epoch)\n",
+    "    train(epoch)\n",
+    "    if epoch % 2 == 0:\n",
+    "        print('\\nTest')\n",
+    "        test(epoch)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "print('Best val Loss: {:4f} at {}'.format(best_loss, best_epoch))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "# choose model filename\n",
+    "weights_path = '{}results/weights/fpn_net_{}.pth'.format(relative_path, model_version)\n",
+    "# Save only the model parameters\n",
+    "torch.save(fpnssd_net.state_dict(), weights_path)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 2",
+   "language": "python",
+   "name": "python2"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 2
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython2",
+   "version": "2.7.6"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
@@ -0,0 +1,907 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Test and visualize sign detector"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "import numpy as np\n",
+    "import pandas as pd\n",
+    "from PIL import Image\n",
+    "from ast import literal_eval\n",
+    "\n",
+    "from tqdm import tqdm\n",
+    "\n",
+    "import torch"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "import matplotlib.pyplot as plt"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "# for auto-reloading external modules\n",
+    "# see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython\n",
+    "%load_ext autoreload\n",
+    "%autoreload 2"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "relative_path = '../../'\n",
+    "# ensure that parent path is on the python path in order to have all packages available\n",
+    "import sys, os\n",
+    "parent_path = os.path.join(os.getcwd(), relative_path)\n",
+    "parent_path = os.path.realpath(parent_path)  # os.path.abspath(...)\n",
+    "sys.path.insert(0, parent_path)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {
+    "scrolled": true
+   },
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "/home/tobias/.virtualenvs/pytorch/local/lib/python2.7/site-packages/cryptography/hazmat/primitives/constant_time.py:26: CryptographyDeprecationWarning: Support for your Python version is deprecated. The next version of cryptography will remove support. Please upgrade to a 2.7.x release that supports hmac.compare_digest as soon as possible.\n",
+      "  utils.DeprecatedIn23,\n"
+     ]
+    }
+   ],
+   "source": [
+    "from lib.datasets.cunei_dataset_segments import CuneiformSegments, get_segment_meta\n",
+    "from lib.models.trained_model_loader import get_fpn_ssd_net\n",
+    "from lib.detection.run_gen_ssd_detection import gen_ssd_detections"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "collections = ['test']   # e.g. train test saa06\n",
+    "only_annotated = True\n",
+    "only_assigned = True\n",
+    "\n",
+    "# store detections for re-use\n",
+    "save_detections = False\n",
+    "\n",
+    "# show detections\n",
+    "show_detections = True"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "model_version = 'v191ft01' \n",
+    "\n",
+    "arch_type = 'mobile'  # resnet, mobile\n",
+    "arch_opt = 1\n",
+    "width_mult = 0.625  # 0.5 0.625 0.75\n",
+    "\n",
+    "crop_shape = [600, 600]\n",
+    "tile_shape = [600, 600]\n",
+    "\n",
+    "num_classes = 240\n",
+    "\n",
+    "with_64 = False \n",
+    "create_bg_class = False \n",
+    "with_4_aspects = False  "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "<a id='complconf'>Config Completeness</a>\n",
+    "\n",
+    "[Jump to results](#results)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "test_min_score_thresh = 0.01  # 0.01 0.05\n",
+    "test_nms_thresh = 0.5 \n",
+    "\n",
+    "eval_ovthresh = 0.5 # 0.4"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "### Load Model"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "device = 'cuda' if torch.cuda.is_available() else 'cpu'"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "metadata": {
+    "scrolled": true
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "FPNSSD(\n",
+      "  (fpn): MobileNetV2FPN(\n",
+      "    (features): Sequential(\n",
+      "      (0): Sequential(\n",
+      "        (0): Conv2d(1, 20, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)\n",
+      "        (1): BatchNorm2d(20, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+      "        (2): ReLU6(inplace)\n",
+      "      )\n",
+      "      (1): MobileBlock(\n",
+      "        (mobile_block): Sequential(\n",
+      "          (0): InvertedResidual(\n",
+      "            (conv): Sequential(\n",
+      "              (0): Conv2d(20, 20, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+      "              (1): BatchNorm2d(20, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+      "              (2): ReLU6(inplace)\n",
+      "              (3): Conv2d(20, 20, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=20, bias=False)\n",
+      "              (4): BatchNorm2d(20, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+      "              (5): ReLU6(inplace)\n",
+      "              (6): Conv2d(20, 10, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+      "              (7): BatchNorm2d(10, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+      "            )\n",
+      "          )\n",
+      "        )\n",
+      "      )\n",
+      "      (2): MobileBlock(\n",
+      "        (mobile_block): Sequential(\n",
+      "          (0): InvertedResidual(\n",
+      "            (conv): Sequential(\n",
+      "              (0): Conv2d(10, 60, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+      "              (1): BatchNorm2d(60, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+      "              (2): ReLU6(inplace)\n",
+      "              (3): Conv2d(60, 60, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=60, bias=False)\n",
+      "              (4): BatchNorm2d(60, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+      "              (5): ReLU6(inplace)\n",
+      "              (6): Conv2d(60, 15, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+      "              (7): BatchNorm2d(15, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+      "            )\n",
+      "          )\n",
+      "          (1): InvertedResidual(\n",
+      "            (conv): Sequential(\n",
+      "              (0): Conv2d(15, 90, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+      "              (1): BatchNorm2d(90, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+      "              (2): ReLU6(inplace)\n",
+      "              (3): Conv2d(90, 90, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=90, bias=False)\n",
+      "              (4): BatchNorm2d(90, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+      "              (5): ReLU6(inplace)\n",
+      "              (6): Conv2d(90, 15, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+      "              (7): BatchNorm2d(15, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+      "            )\n",
+      "          )\n",
+      "        )\n",
+      "      )\n",
+      "      (3): MobileBlock(\n",
+      "        (mobile_block): Sequential(\n",
+      "          (0): InvertedResidual(\n",
+      "            (conv): Sequential(\n",
+      "              (0): Conv2d(15, 90, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+      "              (1): BatchNorm2d(90, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+      "              (2): ReLU6(inplace)\n",
+      "              (3): Conv2d(90, 90, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=90, bias=False)\n",
+      "              (4): BatchNorm2d(90, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+      "              (5): ReLU6(inplace)\n",
+      "              (6): Conv2d(90, 20, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+      "              (7): BatchNorm2d(20, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+      "            )\n",
+      "          )\n",
+      "          (1): InvertedResidual(\n",
+      "            (conv): Sequential(\n",
+      "              (0): Conv2d(20, 120, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+      "              (1): BatchNorm2d(120, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+      "              (2): ReLU6(inplace)\n",
+      "              (3): Conv2d(120, 120, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=120, bias=False)\n",
+      "              (4): BatchNorm2d(120, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+      "              (5): ReLU6(inplace)\n",
+      "              (6): Conv2d(120, 20, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+      "              (7): BatchNorm2d(20, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+      "            )\n",
+      "          )\n",
+      "          (2): InvertedResidual(\n",
+      "            (conv): Sequential(\n",
+      "              (0): Conv2d(20, 120, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+      "              (1): BatchNorm2d(120, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+      "              (2): ReLU6(inplace)\n",
+      "              (3): Conv2d(120, 120, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=120, bias=False)\n",
+      "              (4): BatchNorm2d(120, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+      "              (5): ReLU6(inplace)\n",
+      "              (6): Conv2d(120, 20, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+      "              (7): BatchNorm2d(20, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+      "            )\n",
+      "          )\n",
+      "        )\n",
+      "      )\n",
+      "      (4): MobileBlock(\n",
+      "        (mobile_block): Sequential(\n",
+      "          (0): InvertedResidual(\n",
+      "            (conv): Sequential(\n",
+      "              (0): Conv2d(20, 120, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+      "              (1): BatchNorm2d(120, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+      "              (2): ReLU6(inplace)\n",
+      "              (3): Conv2d(120, 120, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=120, bias=False)\n",
+      "              (4): BatchNorm2d(120, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+      "              (5): ReLU6(inplace)\n",
+      "              (6): Conv2d(120, 40, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+      "              (7): BatchNorm2d(40, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+      "            )\n",
+      "          )\n",
+      "          (1): InvertedResidual(\n",
+      "            (conv): Sequential(\n",
+      "              (0): Conv2d(40, 240, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+      "              (1): BatchNorm2d(240, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+      "              (2): ReLU6(inplace)\n",
+      "              (3): Conv2d(240, 240, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=240, bias=False)\n",
+      "              (4): BatchNorm2d(240, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+      "              (5): ReLU6(inplace)\n",
+      "              (6): Conv2d(240, 40, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+      "              (7): BatchNorm2d(40, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+      "            )\n",
+      "          )\n",
+      "          (2): InvertedResidual(\n",
+      "            (conv): Sequential(\n",
+      "              (0): Conv2d(40, 240, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+      "              (1): BatchNorm2d(240, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+      "              (2): ReLU6(inplace)\n",
+      "              (3): Conv2d(240, 240, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=240, bias=False)\n",
+      "              (4): BatchNorm2d(240, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+      "              (5): ReLU6(inplace)\n",
+      "              (6): Conv2d(240, 40, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+      "              (7): BatchNorm2d(40, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+      "            )\n",
+      "          )\n",
+      "          (3): InvertedResidual(\n",
+      "            (conv): Sequential(\n",
+      "              (0): Conv2d(40, 240, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+      "              (1): BatchNorm2d(240, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+      "              (2): ReLU6(inplace)\n",
+      "              (3): Conv2d(240, 240, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=240, bias=False)\n",
+      "              (4): BatchNorm2d(240, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+      "              (5): ReLU6(inplace)\n",
+      "              (6): Conv2d(240, 40, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+      "              (7): BatchNorm2d(40, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+      "            )\n",
+      "          )\n",
+      "        )\n",
+      "      )\n",
+      "      (5): MobileBlock(\n",
+      "        (mobile_block): Sequential(\n",
+      "          (0): InvertedResidual(\n",
+      "            (conv): Sequential(\n",
+      "              (0): Conv2d(40, 240, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+      "              (1): BatchNorm2d(240, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+      "              (2): ReLU6(inplace)\n",
+      "              (3): Conv2d(240, 240, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=240, bias=False)\n",
+      "              (4): BatchNorm2d(240, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+      "              (5): ReLU6(inplace)\n",
+      "              (6): Conv2d(240, 60, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+      "              (7): BatchNorm2d(60, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+      "            )\n",
+      "          )\n",
+      "          (1): InvertedResidual(\n",
+      "            (conv): Sequential(\n",
+      "              (0): Conv2d(60, 360, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+      "              (1): BatchNorm2d(360, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+      "              (2): ReLU6(inplace)\n",
+      "              (3): Conv2d(360, 360, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=360, bias=False)\n",
+      "              (4): BatchNorm2d(360, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+      "              (5): ReLU6(inplace)\n",
+      "              (6): Conv2d(360, 60, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+      "              (7): BatchNorm2d(60, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+      "            )\n",
+      "          )\n",
+      "          (2): InvertedResidual(\n",
+      "            (conv): Sequential(\n",
+      "              (0): Conv2d(60, 360, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+      "              (1): BatchNorm2d(360, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+      "              (2): ReLU6(inplace)\n",
+      "              (3): Conv2d(360, 360, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=360, bias=False)\n",
+      "              (4): BatchNorm2d(360, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+      "              (5): ReLU6(inplace)\n",
+      "              (6): Conv2d(360, 60, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+      "              (7): BatchNorm2d(60, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+      "            )\n",
+      "          )\n",
+      "        )\n",
+      "      )\n",
+      "      (6): MobileBlock(\n",
+      "        (mobile_block): Sequential(\n",
+      "          (0): InvertedResidual(\n",
+      "            (conv): Sequential(\n",
+      "              (0): Conv2d(60, 360, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+      "              (1): BatchNorm2d(360, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+      "              (2): ReLU6(inplace)\n",
+      "              (3): Conv2d(360, 360, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=360, bias=False)\n",
+      "              (4): BatchNorm2d(360, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+      "              (5): ReLU6(inplace)\n",
+      "              (6): Conv2d(360, 100, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+      "              (7): BatchNorm2d(100, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+      "            )\n",
+      "          )\n",
+      "          (1): InvertedResidual(\n",
+      "            (conv): Sequential(\n",
+      "              (0): Conv2d(100, 600, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+      "              (1): BatchNorm2d(600, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+      "              (2): ReLU6(inplace)\n",
+      "              (3): Conv2d(600, 600, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=600, bias=False)\n",
+      "              (4): BatchNorm2d(600, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+      "              (5): ReLU6(inplace)\n",
+      "              (6): Conv2d(600, 100, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+      "              (7): BatchNorm2d(100, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+      "            )\n",
+      "          )\n",
+      "          (2): InvertedResidual(\n",
+      "            (conv): Sequential(\n",
+      "              (0): Conv2d(100, 600, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+      "              (1): BatchNorm2d(600, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+      "              (2): ReLU6(inplace)\n",
+      "              (3): Conv2d(600, 600, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=600, bias=False)\n",
+      "              (4): BatchNorm2d(600, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+      "              (5): ReLU6(inplace)\n",
+      "              (6): Conv2d(600, 100, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+      "              (7): BatchNorm2d(100, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+      "            )\n",
+      "          )\n",
+      "        )\n",
+      "      )\n",
+      "      (7): Sequential(\n",
+      "        (0): Conv2d(100, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+      "        (1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+      "        (2): ReLU6(inplace)\n",
+      "      )\n",
+      "      (8): AvgPool2d(kernel_size=7, stride=1, padding=0)\n",
+      "    )\n",
+      "    (conv6): Conv2d(512, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))\n",
+      "    (toplayer): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1))\n",
+      "  )\n",
+      "  (loc_head): Sequential(\n",
+      "    (0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n",
+      "    (1): ReLU(inplace)\n",
+      "    (2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n",
+      "    (3): ReLU(inplace)\n",
+      "    (4): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n",
+      "    (5): ReLU(inplace)\n",
+      "    (6): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n",
+      "    (7): ReLU(inplace)\n",
+      "    (8): Conv2d(256, 48, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n",
+      "  )\n",
+      "  (cls_head): Sequential(\n",
+      "    (0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n",
+      "    (1): ReLU(inplace)\n",
+      "    (2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n",
+      "    (3): ReLU(inplace)\n",
+      "    (4): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n",
+      "    (5): ReLU(inplace)\n",
+      "    (6): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n",
+      "    (7): ReLU(inplace)\n",
+      "    (8): Conv2d(256, 2880, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n",
+      "  )\n",
+      ")\n"
+     ]
+    }
+   ],
+   "source": [
+    "fpnssd_net = get_fpn_ssd_net(model_version, device, arch_type, with_64, arch_opt, width_mult, \n",
+    "                             relative_path, num_classes, num_c=1, rnd_init_model=False)\n",
+    "\n",
+    "print(fpnssd_net)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 12,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "(torch.Size([1, 15360, 4]), torch.Size([1, 15360, 240]))\n"
+     ]
+    }
+   ],
+   "source": [
+    "### Test net\n",
+    "loc_preds, cls_preds = fpnssd_net(torch.randn(1, 1, 1024, 1024).to(device))\n",
+    "print(loc_preds.size(), cls_preds.size())"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 13,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "### Prepare dataset"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 14,
+   "metadata": {
+    "scrolled": false
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Setup dataset spanning 3 collections with 4465 annotations [67 segments, 67 indices]\n"
+     ]
+    }
+   ],
+   "source": [
+    "dataset = CuneiformSegments(collections=collections, relative_path=relative_path, \n",
+    "                            only_annotated=only_annotated, only_assigned=only_assigned, preload_segments=False)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 15,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "### Predict"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 16,
+   "metadata": {
+    "scrolled": false
+   },
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "train:   5%|▌         | 1/19 [00:00<00:15,  1.20it/s]"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "('P334921', 'Obv')\n",
+      "mAP 0.7926 | global AP: 0.7473 | mAP (align): 0.8859\n",
+      "total_tp: 22 | total_fp: 17 [46] | acc: 0.56\n",
+      "('P334921', 'Rev')\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "\r",
+      "train:  11%|█         | 2/19 [00:01<00:08,  1.98it/s]"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "mAP 0.7143 | global AP: 0.7286 | mAP (align): 1.0\n",
+      "total_tp: 7 | total_fp: 1 [9] | acc: 0.88\n",
+      "('P334863', 'Obv')\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "/home/tobias/Dropbox/NeuralNets/caffe_workspace/pycaffe/cuneiform-sign-detection/lib/evaluations/sign_evaluator.py:184: RuntimeWarning: invalid value encountered in divide\n",
+      "  return num_tp, num_fp, num_fp_global, num_tp / float(num_tp + num_fp)\n",
+      "\r",
+      "train:  16%|█▌        | 3/19 [00:01<00:06,  2.43it/s]"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "mAP 0.0 | global AP: 0.0 | mAP (align): nan\n",
+      "total_tp: 0 | total_fp: 0 [2] | acc: nan\n",
+      "('P334831', 'Rev')\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "\r",
+      "train:  21%|██        | 4/19 [00:01<00:05,  2.57it/s]"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "mAP 0.8409 | global AP: 0.7323 | mAP (align): 0.881\n",
+      "total_tp: 27 | total_fp: 38 [74] | acc: 0.42\n",
+      "('P334831', 'Obv')\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "\r",
+      "train:  26%|██▋       | 5/19 [00:01<00:05,  2.60it/s]"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "mAP 0.9274 | global AP: 0.8946 | mAP (align): 0.9518\n",
+      "total_tp: 59 | total_fp: 55 [97] | acc: 0.52\n",
+      "('P334892', 'Rev')\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "\r",
+      "train:  32%|███▏      | 6/19 [00:02<00:04,  2.73it/s]"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "mAP 0.9236 | global AP: 0.8063 | mAP (align): 0.9236\n",
+      "total_tp: 18 | total_fp: 10 [28] | acc: 0.64\n",
+      "('P334892', 'Obv')\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "\r",
+      "train:  37%|███▋      | 7/19 [00:02<00:04,  2.91it/s]"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "mAP 0.9667 | global AP: 0.9474 | mAP (align): 0.9667\n",
+      "total_tp: 18 | total_fp: 8 [27] | acc: 0.69\n",
+      "('P336635', 'Obv')\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "\r",
+      "train:  42%|████▏     | 8/19 [00:02<00:03,  2.94it/s]"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "mAP 0.9061 | global AP: 0.7393 | mAP (align): 0.9061\n",
+      "total_tp: 13 | total_fp: 8 [35] | acc: 0.62\n",
+      "('P334865', 'Obv')\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "\r",
+      "train:  47%|████▋     | 9/19 [00:03<00:03,  2.93it/s]"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "mAP 0.9157 | global AP: 0.887 | mAP (align): 0.9443\n",
+      "total_tp: 40 | total_fp: 34 [52] | acc: 0.54\n",
+      "('P334865', 'Rev')\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "train:  58%|█████▊    | 11/19 [00:03<00:02,  3.14it/s]"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "mAP 0.8139 | global AP: 0.8114 | mAP (align): 0.8879\n",
+      "total_tp: 49 | total_fp: 47 [71] | acc: 0.51\n",
+      "('P334842', 'Obv')\n",
+      "mAP 0.0 | global AP: 0.0 | mAP (align): 0.0\n",
+      "total_tp: 0 | total_fp: 0 [0] | acc: 0.0\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "\r",
+      "train:  63%|██████▎   | 12/19 [00:03<00:02,  3.16it/s]"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "('P334848', 'Obv')\n",
+      "mAP 0.9074 | global AP: 0.9346 | mAP (align): 0.9074\n",
+      "total_tp: 22 | total_fp: 14 [25] | acc: 0.61\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "\r",
+      "train:  68%|██████▊   | 13/19 [00:04<00:01,  3.23it/s]"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "('P334848', 'Rev')\n",
+      "mAP 0.9375 | global AP: 0.8014 | mAP (align): 1.0\n",
+      "total_tp: 16 | total_fp: 4 [38] | acc: 0.8\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "\r",
+      "train:  74%|███████▎  | 14/19 [00:04<00:01,  3.23it/s]"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "('P334839', 'Obv')\n",
+      "mAP 0.9333 | global AP: 0.8562 | mAP (align): 0.9956\n",
+      "total_tp: 22 | total_fp: 17 [52] | acc: 0.56\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "\r",
+      "train:  79%|███████▉  | 15/19 [00:04<00:01,  3.24it/s]"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "('P334896', 'Obv')\n",
+      "mAP 0.9375 | global AP: 0.8586 | mAP (align): 0.9375\n",
+      "total_tp: 26 | total_fp: 20 [36] | acc: 0.57\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "\r",
+      "train:  84%|████████▍ | 16/19 [00:04<00:00,  3.21it/s]"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "('P334836', 'Rev')\n",
+      "mAP 1.0 | global AP: 0.9705 | mAP (align): 1.0\n",
+      "total_tp: 45 | total_fp: 24 [65] | acc: 0.65\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "\r",
+      "train:  89%|████████▉ | 17/19 [00:05<00:00,  3.17it/s]"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "('P334894', 'Rev')\n",
+      "mAP 1.0 | global AP: 0.8218 | mAP (align): 1.0\n",
+      "total_tp: 15 | total_fp: 9 [39] | acc: 0.62\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "\r",
+      "train:  95%|█████████▍| 18/19 [00:05<00:00,  3.14it/s]"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "('P334894', 'Obv')\n",
+      "mAP 0.8727 | global AP: 0.8457 | mAP (align): 0.8727\n",
+      "total_tp: 41 | total_fp: 39 [62] | acc: 0.51\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "\r",
+      "train: 100%|██████████| 19/19 [00:06<00:00,  3.14it/s]"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "('P336178', 'Obv')\n",
+      "mAP 0.9375 | global AP: 0.8319 | mAP (align): 0.9375\n",
+      "total_tp: 31 | total_fp: 20 [34] | acc: 0.61\n",
+      "train | v191ft01\n",
+      "RESULTS ON FULL COLLECTION :\n",
+      "mAP 0.7739 | global AP: 0.7816 | mAP (align): 0.7958\n",
+      "total_tp: 471 | total_fp: 690 [792] | prec: 0.406\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "\n"
+     ]
+    }
+   ],
+   "source": [
+    "# filter collection dataset - OPTIONAL\n",
+    "didx_list = range(len(dataset))\n",
+    "didx_list = didx_list[:19]  #19\n",
+    "\n",
+    "### Generate ssd detections\n",
+    "(list_seg_ap, \n",
+    " list_seg_name_with_anno) = gen_ssd_detections(didx_list, dataset, collections[0], relative_path, \n",
+    "                                               model_version, fpnssd_net, with_64, create_bg_class, device,\n",
+    "                                               test_min_score_thresh, test_nms_thresh, eval_ovthresh,\n",
+    "                                               save_detections, show_detections, with_4_aspects=with_4_aspects, \n",
+    "                                               verbose_mode=True)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "<a id='results'>Results</a>\n",
+    "\n",
+    "[Jump to completeness config](#complconf)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "[Jump to Results](#results)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 2",
+   "language": "python",
+   "name": "python2"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 2
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython2",
+   "version": "2.7.6"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
@@ -0,0 +1,634 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Train sign detector network"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "import numpy as np\n",
+    "import pandas as pd\n",
+    "from PIL import Image\n",
+    "from ast import literal_eval\n",
+    "import os.path\n",
+    "from tqdm import tqdm\n",
+    "import copy\n",
+    "\n",
+    "import torch\n",
+    "import torch.optim as optim\n",
+    "from torch.optim import lr_scheduler\n",
+    "import torch.utils.data as data\n",
+    "\n",
+    "from torchvision import transforms as trafos\n",
+    "import torchvision.transforms as transforms"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "import matplotlib.pyplot as plt"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "# for auto-reloading external modules\n",
+    "# see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython\n",
+    "%load_ext autoreload\n",
+    "%autoreload 2"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "relative_path = '../../'\n",
+    "# ensure that parent path is on the python path in order to have all packages available\n",
+    "import sys, os\n",
+    "parent_path = os.path.join(os.getcwd(), relative_path)\n",
+    "parent_path = os.path.realpath(parent_path)  # os.path.abspath(...)\n",
+    "sys.path.insert(0, parent_path)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "from lib.datasets.cunei_dataset_ssd import CuneiformSSD\n",
+    "\n",
+    "from lib.alignment.LineFragment import plot_boxes\n",
+    "from lib.utils.pytorch_utils import get_tensorboard_writer"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "from lib.models.mobilenetv2_mod03 import MobileNetV2\n",
+    "from lib.models.mobilenetv2_fpn import MobileNetV2FPN\n",
+    "from lib.models.trained_model_loader import get_fpn_ssd_net\n",
+    "from lib.utils.torchcv.models.net import FPNSSD\n",
+    "from lib.utils.torchcv.loss.ssd_loss import SSDLoss"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "import time\n",
+    "hh = 0.001\n",
+    "## time.sleep(60*60*hh)\n",
+    "for i in tqdm(range(int(6*60*hh))):\n",
+    "    time.sleep(10)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Config Basics"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "model_version = 'v001'\n",
+    "\n",
+    "# config pretrained classifier\n",
+    "pretrained_model_version = 'v001'  #'v239'  \n",
+    "\n",
+    "# config datasets for training and testing\n",
+    "train_collections = ['train_E'] \n",
+    "test_collections =  ['test_full']"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "# config generated data\n",
+    "with_gen_data = True\n",
+    "\n",
+    "gen_model_version = 'v001'  #'v171_hp04'  \n",
+    "\n",
+    "gen_folder = 'results_ssd/{}/'.format(gen_model_version)  \n",
+    "gen_file_path = None\n",
+    "\n",
+    "gen_collections = ['saa01', 'saa05', 'saa08', 'saa10', 'saa13', 'saa16']\n",
+    "#gen_collections = ['saa01', 'saa05']\n",
+    "gen_collections += ['train']"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "# config backbone architecture\n",
+    "arch_opt = 1\n",
+    "arch_type = 'mobile'\n",
+    "width_mult = 0.625\n",
+    "\n",
+    "# config detector\n",
+    "with_64 = False\n",
+    "create_bg_class = False\n",
+    "img_size = 512\n",
+    "num_classes = 240\n",
+    "\n",
+    "# config schedule\n",
+    "num_epochs = 51 \n",
+    "lr_milestones = [60]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "# set log file name\n",
+    "if with_gen_data:\n",
+    "    version_remark = '{}_fpnssd_mobilenetv2_{}_gen_{}'\n",
+    "    version_remark = version_remark.format(\"_\".join(train_collections), pretrained_model_version, gen_model_version)\n",
+    "else:\n",
+    "    version_remark = '{}_fpnssd_mobilenetv2_{}'\n",
+    "    version_remark = version_remark.format(\"_\".join(train_collections), pretrained_model_version)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Preparing Datasets"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "if with_gen_data:\n",
+    "    from lib.utils.torchcv.box_coder_retina_lm import RetinaBoxCoder\n",
+    "    from lib.utils.torchcv.transforms_lm.resize import resize_lm\n",
+    "    from lib.utils.torchcv.transforms_lm.random_crop_tile import random_crop_tile_lm\n",
+    "    from lib.utils.torchcv.transforms_lm.pad_gs import pad_lm\n",
+    "else:\n",
+    "    from lib.utils.torchcv.box_coder_retina import RetinaBoxCoder\n",
+    "    from lib.utils.torchcv.transforms.resize import resize\n",
+    "    from lib.utils.torchcv.transforms.random_crop_tile import random_crop_tile\n",
+    "    from lib.utils.torchcv.transforms.pad_gs import pad"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "box_coder = RetinaBoxCoder(create_bg_class=create_bg_class)\n",
+    "print('num_anchors', len(box_coder.anchor_boxes))\n",
+    "print('anchor areas', np.sqrt(box_coder.anchor_areas))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "if with_gen_data:    \n",
+    "    def transform_train(img, boxes, labels, linemap):\n",
+    "        # img = transforms.ColorJitter(0.3,0.3,0,0)(img)\n",
+    "        img = transforms.RandomChoice([transforms.ColorJitter(0.5,0.5,0,0), \n",
+    "                                       transforms.Lambda(lambda x: x)  # identity\n",
+    "                                      ])(img)  \n",
+    "        img, linemap = pad_lm(img, linemap, (600, 600))\n",
+    "        img, boxes, labels, linemap = random_crop_tile_lm(img, boxes, labels, linemap, scale_range=[0.65, 1], max_aspect_ratio=1.35)\n",
+    "        img, boxes, linemap = resize_lm(img, boxes, linemap, size=(img_size, img_size), random_interpolation=True)\n",
+    "        img = transforms.Compose([\n",
+    "            transforms.ToTensor(),\n",
+    "            transforms.Normalize(mean=[0.5], std=[1.0])\n",
+    "        ])(img)\n",
+    "        boxes, labels = box_coder.encode(boxes, labels, linemap)\n",
+    "\n",
+    "        return img, boxes, labels, transforms.ToTensor()(linemap)\n",
+    "else:\n",
+    "    def transform_train(img, boxes, labels):\n",
+    "        # img = transforms.ColorJitter(0.3,0.3,0,0)(img)\n",
+    "        img = transforms.RandomChoice([transforms.ColorJitter(0.5,0.5,0,0), \n",
+    "                                       transforms.Lambda(lambda x: x)  # identity\n",
+    "                                      ])(img)  \n",
+    "        img = pad(img, (600, 600))\n",
+    "        img, boxes, labels = random_crop_tile(img, boxes, labels, scale_range=[0.65, 1], max_aspect_ratio=1.35)\n",
+    "        img, boxes = resize(img, boxes, size=(img_size, img_size), random_interpolation=True)\n",
+    "        img = transforms.Compose([\n",
+    "            transforms.ToTensor(),\n",
+    "            transforms.Normalize(mean=[0.5], std=[1.0])\n",
+    "        ])(img)\n",
+    "        boxes, labels = box_coder.encode(boxes, labels)\n",
+    "        return img, boxes, labels"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "if with_gen_data:\n",
+    "    trainset = CuneiformSSD(collections=train_collections, transform=transform_train, \n",
+    "                            gen_file_path=gen_file_path, gen_collections=gen_collections, gen_folder=gen_folder, \n",
+    "                            relative_path=relative_path, use_balanced_idx=False, use_linemaps=True, \n",
+    "                            remove_empty_tiles=False, min_align_ratio=0.2)\n",
+    "else:\n",
+    "    trainset = CuneiformSSD(collections=train_collections, transform=transform_train,\n",
+    "                            gen_file_path=gen_file_path, relative_path=relative_path, use_linemaps=False)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "if with_gen_data:\n",
+    "    def transform_test(img, boxes, labels, linemap):\n",
+    "        img, boxes, labels, linemap = random_crop_tile_lm(img, boxes, labels, linemap, scale_range=[0.85, 0.86], max_aspect_ratio=1.001)\n",
+    "        img, boxes, linemap = resize_lm(img, boxes, linemap, size=(img_size, img_size), random_interpolation=True)\n",
+    "        img = transforms.Compose([\n",
+    "            transforms.ToTensor(),\n",
+    "            transforms.Normalize(mean=[0.5],std=[1.0])\n",
+    "        ])(img)\n",
+    "        boxes, labels = box_coder.encode(boxes, labels, linemap)\n",
+    "        return img, boxes, labels, transforms.ToTensor()(linemap)\n",
+    "else:\n",
+    "    def transform_test(img, boxes, labels):\n",
+    "        img, boxes, labels = random_crop_tile(img, boxes, labels, scale_range=[0.85, 0.86], max_aspect_ratio=1.001)\n",
+    "        img, boxes = resize(img, boxes, size=(img_size, img_size), random_interpolation=True)\n",
+    "        img = transforms.Compose([\n",
+    "            transforms.ToTensor(),\n",
+    "            transforms.Normalize(mean=[0.5],std=[1.0])\n",
+    "        ])(img)\n",
+    "        boxes, labels = box_coder.encode(boxes, labels)\n",
+    "        return img, boxes, labels"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "if with_gen_data:\n",
+    "    testset = CuneiformSSD(collections=test_collections, transform=transform_test,\n",
+    "                           gen_file_path=None, relative_path=relative_path, use_linemaps=True)\n",
+    "else:\n",
+    "    testset = CuneiformSSD(collections=test_collections, transform=transform_test,\n",
+    "                           gen_file_path=None, relative_path=relative_path, use_linemaps=False)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "trainloader = data.DataLoader(trainset, batch_size=32, shuffle=True, num_workers=3)\n",
+    "testloader = data.DataLoader(testset, batch_size=32, shuffle=False, num_workers=3)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Building Model"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "device = 'cuda' if torch.cuda.is_available() else 'cpu'"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "# load classifier model\n",
+    "basic_net = MobileNetV2(input_size=224, width_mult=width_mult, n_class=num_classes, input_dim=1, arch_opt=arch_opt)\n",
+    "\n",
+    "# load pretrained weights\n",
+    "weights_path = '{}results/weights/cuneiNet_basic_{}.pth'.format(relative_path, pretrained_model_version)\n",
+    "basic_net.load_state_dict(torch.load(weights_path))  # , strict=False\n",
+    "basic_net = basic_net.to(device)\n",
+    "\n",
+    "# load FPN model with classifier model\n",
+    "fpn_net = MobileNetV2FPN(basic_net, num_classes=num_classes, width_mult=width_mult, with_p4=with_64).to(device)\n",
+    "\n",
+    "# load full detector net\n",
+    "fpnssd_net = FPNSSD(fpn_net, num_classes=num_classes).to(device)\n",
+    "fpnssd_net.train()\n",
+    "\n",
+    "# print model\n",
+    "print(fpnssd_net)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "### Test net\n",
+    "loc_preds, cls_preds = fpnssd_net(torch.randn(1, 1, img_size, img_size).to(device))\n",
+    "print(loc_preds.size(), cls_preds.size())"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Optimization"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "criterion = SSDLoss(num_classes=num_classes)\n",
+    "#criterion = FocalLoss(num_classes=num_classes)\n",
+    "optimizer = optim.SGD(fpnssd_net.parameters(), lr=0.001, momentum=0.9, weight_decay=1e-4)\n",
+    "\n",
+    "# lr policy\n",
+    "# scheduler = lr_scheduler.ExponentialLR(optimizer, gamma=0.97)\n",
+    "scheduler = lr_scheduler.MultiStepLR(optimizer, milestones=lr_milestones, gamma=0.1)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "# init logger\n",
+    "if version_remark == '':\n",
+    "    comment_str = '_{}'.format(model_version)\n",
+    "else:\n",
+    "    comment_str = '_{}_{}'.format(model_version, version_remark)\n",
+    "writer = get_tensorboard_writer(logs_folder='{}results/run_logs/detector'.format(relative_path), comment=comment_str)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "# Training\n",
+    "best_loss = float('inf')  # best test loss\n",
+    "best_epoch = 0\n",
+    "best_model_wts = copy.deepcopy(fpnssd_net.state_dict())\n",
+    "\n",
+    "\n",
+    "def train(epoch):\n",
+    "    fpnssd_net.train()\n",
+    "    train_loss = 0\n",
+    "\n",
+    "    scheduler.step()\n",
+    "\n",
+    "    if with_gen_data:\n",
+    "        for batch_idx, (inputs, loc_targets, cls_targets, linemap) in enumerate(trainloader):\n",
+    "            inputs = inputs.to(device)\n",
+    "            loc_targets = loc_targets.to(device)\n",
+    "            cls_targets = cls_targets.to(device)\n",
+    "\n",
+    "            optimizer.zero_grad()\n",
+    "            loc_preds, cls_preds = fpnssd_net(inputs)\n",
+    "            loss = criterion(loc_preds, loc_targets, cls_preds, cls_targets)\n",
+    "            loss.backward()\n",
+    "            optimizer.step()\n",
+    "\n",
+    "            train_loss += loss.item()\n",
+    "            print('train_loss: %.3f | avg_loss: %.3f [%d/%d]'\n",
+    "                  % (loss.item(), train_loss/(batch_idx+1), batch_idx+1, len(trainloader)))\n",
+    "    else:\n",
+    "        for batch_idx, (inputs, loc_targets, cls_targets) in enumerate(trainloader):\n",
+    "            inputs = inputs.to(device)\n",
+    "            loc_targets = loc_targets.to(device)\n",
+    "            cls_targets = cls_targets.to(device)\n",
+    "\n",
+    "            optimizer.zero_grad()\n",
+    "            loc_preds, cls_preds = fpnssd_net(inputs)\n",
+    "            loss = criterion(loc_preds, loc_targets, cls_preds, cls_targets)\n",
+    "            loss.backward()\n",
+    "            optimizer.step()\n",
+    "\n",
+    "            train_loss += loss.item()\n",
+    "            print('train_loss: %.3f | avg_loss: %.3f [%d/%d]'\n",
+    "                  % (loss.item(), train_loss/(batch_idx+1), batch_idx+1, len(trainloader)))\n",
+    "\n",
+    "    # write to logger\n",
+    "    phase = 'train'\n",
+    "    writer.add_scalar('data/{}/loss'.format(phase), train_loss / len(trainloader), epoch)\n",
+    "\n",
+    "def test(epoch):\n",
+    "    fpnssd_net.eval()\n",
+    "    test_loss = 0\n",
+    "    with torch.no_grad():\n",
+    "\n",
+    "        if with_gen_data:\n",
+    "            for batch_idx, (inputs, loc_targets, cls_targets, linemap) in enumerate(testloader):\n",
+    "                inputs = inputs.to(device)\n",
+    "                loc_targets = loc_targets.to(device)\n",
+    "                cls_targets = cls_targets.to(device)\n",
+    "\n",
+    "                loc_preds, cls_preds = fpnssd_net(inputs)\n",
+    "                loss = criterion(loc_preds, loc_targets, cls_preds, cls_targets)\n",
+    "                test_loss += loss.item()\n",
+    "                print('test_loss: %.3f | avg_loss: %.3f [%d/%d]'\n",
+    "                      % (loss.item(), test_loss/(batch_idx+1), batch_idx+1, len(testloader)))\n",
+    "        else:\n",
+    "            for batch_idx, (inputs, loc_targets, cls_targets) in enumerate(testloader):\n",
+    "                inputs = inputs.to(device)\n",
+    "                loc_targets = loc_targets.to(device)\n",
+    "                cls_targets = cls_targets.to(device)\n",
+    "\n",
+    "                loc_preds, cls_preds = fpnssd_net(inputs)\n",
+    "                loss = criterion(loc_preds, loc_targets, cls_preds, cls_targets)\n",
+    "                test_loss += loss.item()\n",
+    "                print('test_loss: %.3f | avg_loss: %.3f [%d/%d]'\n",
+    "                      % (loss.item(), test_loss/(batch_idx+1), batch_idx+1, len(testloader)))\n",
+    "\n",
+    "    # write to logger\n",
+    "    phase = 'test'\n",
+    "    writer.add_scalar('data/{}/loss'.format(phase), test_loss / len(testloader), epoch)\n",
+    "\n",
+    "    # deep copy the model\n",
+    "    global best_loss\n",
+    "    global best_epoch\n",
+    "    test_loss /= len(testloader)\n",
+    "    if test_loss < best_loss and epoch > 5:\n",
+    "        # best_model_wts = copy.deepcopy(fpnssd_net.state_dict())\n",
+    "        weights_path = '{}results/weights/fpn_net_{}_best.pth'.format(relative_path, model_version)\n",
+    "        torch.save(fpnssd_net.state_dict(), weights_path)\n",
+    "        best_epoch = epoch\n",
+    "        best_loss = test_loss"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true,
+    "scrolled": true
+   },
+   "outputs": [],
+   "source": [
+    "for epoch in tqdm(range(num_epochs)):\n",
+    "    print('\\nEpoch: %d' % epoch)\n",
+    "    train(epoch)\n",
+    "    if epoch % 2 == 0:\n",
+    "        print('\\nTest')\n",
+    "        test(epoch)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "print('Best val Loss: {:4f} at {}'.format(best_loss, best_epoch))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "# choose model filename\n",
+    "weights_path = '{}results/weights/fpn_net_{}.pth'.format(relative_path, model_version)\n",
+    "# Save only the model parameters\n",
+    "torch.save(fpnssd_net.state_dict(), weights_path)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 2",
+   "language": "python",
+   "name": "python2"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 2
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython2",
+   "version": "2.7.6"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
@@ -0,0 +1,76 @@
+### Compile and intall OpenGM with python wrapper in virtualenv 
+
+#### Online references
+- conda install guide:
+	- https://groups.google.com/forum/#!searchin/opengm/nose%7Csort:relevance/opengm/Nte5Zpu9RL0/YSanK09kNwAJ
+- plain ubuntu install guides:
+	- http://cvlab-dresden.de/HTML/people/bogdan/teaching/slides-script/ml2-ss15/installation-readme.txt
+	- https://memoryaux.wordpress.com/2014/08/15/installing-opengm-with-python-wrapper/
+
+#### Instructions (tested for Ubuntu 14.04)
+
+clone source using  
+`git clone https://github.com/opengm/opengm.git`
+
+make build dir under opengm/ 
+
+`makedir build/`
+
+and enter build/
+
+`cd build/`
+
+using ccmake and try to configure with 'c' 
+
+`ccmake ../`
+
+run ccmake again and select options
+
+`ccmake ../`
+
+
+build:
+  - command line ?
+  - converter ?
+  - docs ?
+  - examples ? (requires external lib like cplex)
+  - python docs ? (requires pip install sphinx and produces ugly outputs)
+  - python wrapper
+  - testing
+  - tutorials
+
+with:
+
+  - boost
+  - hdf5
+
+python:
+
+  - python exectuable: /home/USER/.virtualenvs/VNAME/bin
+  - include dir: /home/USER/.virtualenvs/VNAME/include
+  - include dir2: /home/USER/.virtualenvs/VNAME/include/python2.7
+  - library: /usr/lib/x86_64-linux-gnu/libpython2.7.so 
+(alternative is /home/USER/.virtualenvs/VNAME/lib/python2.7, but no *.so file here)
+  - library debug: PYTHON_LIBRARY_DEBUG-NOTFOUND (default)
+  - numpy include directory: /home/USER/.virtualenvs/VNAME/lib/python2.7/site-packages/numpy/core/include
+
+*for some unkown reason* opengm python site-package is installed under `/usr/local/lib/python0./`
+therefore, better to skip make install and simply copy files by hand (see below)
+
+
+To build run (-j only if multicore system):
+
+```
+make -j4
+make -j2 test
+make install
+
+```
+
+
+simply copy it to `/home/USER/.virtualenvs/VNAME/lib/python2.7/site-packages/`
+
+now test in python:
+`import opengm`
+
+Hopefully things work :)
@@ -0,0 +1,528 @@
+from scipy.spatial.distance import cdist, seuclidean, euclidean, squareform
+import matplotlib.pyplot as plt
+import numpy as np
+import pandas as pd
+from timeit import default_timer as timer
+
+import opengm
+
+from ..utils.bbox_utils import bb_intersection_over_union
+
+
+class LineMatching1D(object):
+
+    def __init__(self, tl_line_rec, region_det, line_rec, line_pts, stats, scale=1.0, sign_hypos=None, param_dict=None):
+        # create graphical model from fragement
+        self.stats = stats
+        self.scale = scale
+        self.scaled_sign_height = stats.tblSignHeight * scale
+        self.min_sign_dist = self.scaled_sign_height / 2.  # distance between sign centers
+
+        self.tl_line_rec = tl_line_rec
+
+        # null hypothesis for signs in tl
+        self.sign_hypos = sign_hypos
+
+        # detections contained in rectangluar area around respective alignments
+        # [ID, cx, cy, score, x1, y1, x2, y2, idx]
+        self.region_det = region_det
+
+        # init
+        self.num_vars = len(self.tl_line_rec)
+        self.num_relevant = 0
+        self.max_cost = 1e10  # 1e11  # "inifinite" cost
+
+        # only continue, if there is a sign in line to match
+        if self.num_vars > 0:
+
+            # compute num_lbls_per_var from detections
+            ulbls, counts = np.unique(self.region_det[:, 0], return_counts=True)
+            hypo_det_counts = np.array([counts[ulbls == item] if item in ulbls else 0 for item in self.tl_line_rec.lbl],
+                                       dtype=int).squeeze()
+
+            self.tl_line_rec['det_count'] = hypo_det_counts
+            # optional: remove vars without detections
+            if False:
+                # self.tl_line_rec = self.tl_line_rec[hypo_det_counts > 0]
+                self.tl_line_rec = self.tl_line_rec.iloc[np.where(hypo_det_counts > 0)]  # deal with scalar case of boolean indexing
+                # hypo_det_counts = hypo_det_counts[hypo_det_counts > 0]
+
+            # only continue, at least a single matching detection
+            self.num_relevant = np.sum(counts[np.isin(ulbls, self.tl_line_rec.lbl)])
+            if self.num_relevant > 0:
+                # update num_vars
+                self.num_vars = len(self.tl_line_rec)
+                # opengm setup
+                self.num_lbls_per_var = max(counts[np.isin(ulbls, self.tl_line_rec.lbl)]) + 1  # + 1 outlier detection
+                var_space = np.ones(self.num_vars) * self.num_lbls_per_var
+                self.gm = opengm.gm(var_space)
+
+                # parameter setup
+                if param_dict is not None:
+                    self.params = param_dict
+                else:
+                    self.params = dict()
+
+                    # extra settings
+                    self.params['outlier_cost'] = 10
+                    self.params['angle_long_range'] = True
+
+                    # unary potentials
+                    self.params['lambda_score'] = 0.3
+                    self.params['sigma_score'] = 0.4
+
+                    self.params['lambda_offset'] = 1  # currently offset used linearly without exp function
+                    self.params['sigma_offset'] = 1   # lambda & sigma have no influence!
+
+                    # pairwise binary potentials
+                    self.params['lambda_p'] = 3  # 1
+                    self.params['sigma_p'] = 3
+
+                    self.params['lambda_angle'] = 2
+                    self.params['sigma_angle'] = 0.6
+
+                    self.params['lambda_iou'] = 2
+                    self.params['sigma_iou'] = 0.4
+
+                    # OPTIONAL: strong penalties for long range connections
+                    if True:
+                        self.params['lr_lambda_angle'] = 0.05
+                        self.params['lr_sigma_angle'] = 0.1
+
+                        self.params['lr_lambda_iou'] = 0.1
+                        self.params['lr_sigma_iou'] = 0.05
+                    else:
+                        self.params['lr_lambda_angle'] = self.params['lambda_angle']
+                        self.params['lr_sigma_angle'] = self.params['sigma_angle']
+
+                        self.params['lr_lambda_iou'] = self.params['lambda_iou']
+                        self.params['lr_sigma_iou'] = self.params['sigma_iou']
+
+                # angle of hypothesis line
+                self.b = line_pts[-1, :] - line_pts[0, :]
+                # print 'hypo angle:', np.arctan2(self.b[1], self.b[0]) * (180 / np.pi), self.b
+
+                # offset
+                self.Xb = line_pts[0, :].reshape(1, -1)
+
+                # define variance between line distance and sign distance - for seculidean and mahalanobis
+                self.variance_p = np.array([1, 0.2], dtype=np.float)  # [8, 1] [1, 1]
+
+                if False:
+                    print('#syms:', len(self.tl_line_rec), 'max#dets_per_sym:', self.num_lbls_per_var - 1,
+                          'relevant#dets:', self.num_relevant, 'total#dets:', self.region_det.shape[0])
+
+                # print(np.vstack([self.fm_hypo_df.lbl, hypo_det_counts])).astype(int)
+                # print self.fm_hypo_df
+
+                # assemble potentials
+                self.add_unary()
+                self.add_pairwise()
+
+    def add_unary(self):
+
+        # for monitoring costs
+        self.unary_score_ct = {}
+        self.unary_offset_ct = {}
+        self.unary_det_ct = {}
+        # compute for later usage by alignment vector
+        self.det_same_label_ct_idx = []
+
+        # assemble unary potentials
+        for vidx, (tl_sign_idx, fm_sign) in enumerate(self.tl_line_rec.iterrows()):
+            lbl = int(fm_sign.lbl)
+            # ctr = [float(fm_sign.ctr_l), float(fm_sign.ctr_r)]
+
+            # print vidx, lbl, ctr
+            # get boxes of certain lbl
+            det_same_label = self.region_det[self.region_det[:, 0] == lbl]
+            self.det_same_label_ct_idx.append(det_same_label[:, -1])
+            # detection locations
+            Xa = det_same_label[:, [1, 2]]
+
+            # incorporate score
+            unary_vec = np.ones(self.num_lbls_per_var) * self.max_cost
+
+            U1, U2, U3 = [], [], []
+            if det_same_label.shape[0] > 0:
+                # compute partial cost (vectorized)
+                U1 = 1 - det_same_label[:, 3]
+                # since goal of matching is to incorporate low confidence detections,
+                # linear contribution of score might be enough / otherwise penalize only low confidence below 0.01
+                if True:
+                    U1 = self.params['lambda_score'] * (np.exp(U1 / self.params['sigma_score']) - 1)
+
+                # incorporate distance from hypothesis line
+                # cx, cy
+                U2 = np.zeros(len(U1))
+                if False:  # disabled in favour of null hypo offset
+                    if self.params['lambda_offset'] != 0:
+                        U2 = cdist(Xa, self.Xb, lambda u, v: np.linalg.norm(np.cross(u - v, self.b))
+                                                             / np.linalg.norm(self.b)) / self.min_sign_dist
+                        # compute partial cost
+                        U2 = self.params['lambda_offset'] * (np.exp(U2.squeeze() / self.params['sigma_offset']) - 1)
+
+                # incorporate null hypothesis of signs
+                U3 = np.zeros(len(U1))
+                if self.params['lambda_offset'] != 0 and self.sign_hypos is not None:
+                    # sign hypo location
+                    X0 = self.sign_hypos[vidx, 0:2].reshape(1, -1)
+                    # get sign width and set variance
+                    if lbl in self.stats.sign_df.index:
+                        sign_width = self.stats.get_sign_width(lbl)
+                    else:
+                        sign_width = 1
+                    var = np.array([sign_width * 1, 1], dtype=np.float)
+                    # compute pairwise distance
+                    U3 = cdist(X0, Xa, metric='seuclidean', V=var) / self.min_sign_dist
+                    # U3 = self.params['lambda_offset'] * (np.exp(U3.squeeze() / self.params['sigma_offset']) - 1)
+                    # U3 = np.clip(U3, 0, 1e-5 * self.max_cost)
+
+                # sum up cost and insert into unary vector (only replace values if there a detections
+                unary_vec[:len(U1)] = U1 + U2 + U3
+
+            # for outlier detection set specific unary cost
+            unary_vec[-1] = self.params['outlier_cost']
+
+            # add function and factor
+            func_id = self.gm.addFunction(unary_vec)
+            self.gm.addFactor(func_id, vidx)
+
+            # for debugging
+            # self.unary_score_ct.append(U1)
+            # self.unary_offset_ct.append(U3)
+            # self.unary_det_ct.append(det_same_label)
+            self.unary_score_ct[vidx] = U1
+            self.unary_offset_ct[vidx] = U3
+            self.unary_det_ct[vidx] = det_same_label
+
+    def add_pairwise(self):
+        # assemble pairwise potentials
+        # Assumption: vars are in order of symbols in line
+        # ATTENTION: ORDER of fm_hypo_lbls is important for pairwise potential generation!!!
+
+        self.pairwise_dist_ct = {}
+        self.pairwise_angle_ct = {}
+        self.pairwise_iou_ct = {}
+        self.pairwise_long_range = {}
+        for vidx in range(self.num_vars - 1):
+            # setup basic matrix with maximum cost
+            dist_mat = np.ones([self.num_lbls_per_var] * 2) * self.max_cost
+
+            sym_lt = self.tl_line_rec.lbl.iat[vidx]
+            sym_rt = self.tl_line_rec.lbl.iat[vidx + 1]
+            # get boxes according to labels
+            # [ID, cx, cy, score, x1, y1, x2, y2]
+            det_sym_lt = self.region_det[self.region_det[:, 0] == sym_lt]
+            det_sym_rt = self.region_det[self.region_det[:, 0] == sym_rt]
+
+            # x2, cy
+            # sym_lt_right_border = det_sym_lt[:,[6,2]]
+            # x1, cy
+            # sym_rt_left_border = det_sym_rt[:,[4,2]]
+
+            # cx, cy
+            sym_lt_right_border = det_sym_lt[:, [1, 2]]
+            sym_rt_left_border = det_sym_rt[:, [1, 2]]
+
+            # bboxes
+            sym_lt_bboxes = det_sym_lt[:, 4:]
+            sym_rt_bboxes = det_sym_rt[:, 4:]
+
+            # compute pairwise distances between detections of lt and rt sym
+
+            # 1) basic computation
+            # X = cdist(sym_lt_right_border, sym_rt_left_border, metric='euclidean')
+            X = cdist(sym_lt_right_border, sym_rt_left_border, metric='seuclidean', V=self.variance_p)
+            # because vertical offset always depends on underlying rotation, mahalanobis should be used here
+            # X = cdist(sym_lt_right_border, sym_rt_left_border, metric='mahalanobis', VI=self.VI)
+            # reduce distances to normal scale and normalize with 10 * times sign_height
+            X = ((X/self.scaled_sign_height) - 1)
+
+            inX = X.copy()
+            # compute partial cost
+            X = self.params['lambda_p'] * (np.exp(X / self.params['sigma_p']) - 1)
+
+            # 2) penalty for wrong side
+            # if on wrong side, increase cost by factor 4  [is deprecated due to angle computation!!!]
+            #X2 = cdist(sym_lt_right_border, sym_rt_left_border, lambda u, v: u[0] > v[0])
+            #X[X2.astype(bool)] *= 5
+
+            # 3) penalize distance only in x-dimension
+            # X8 = cdist(sym_lt_right_border, sym_rt_left_border, lambda u, v: v[0] - u[0])
+            # X8 = self.params['lambda_p'] * np.exp((self.min_sign_dist - X8) / self.params['sigma_p'])
+
+            # incorporate angle
+            # angle with x-axis: np.arctan((u[1]-v[1])/(u[0]-v[0]))
+            # b=np.array([1,0])
+            # angle between vectors less stable: acos(dot(v1, v2) / (norm(v1) * norm(v2)))
+            # X3 = cdist(sym_lt_right_border, sym_rt_left_border,
+            #           lambda u,v: np.arccos(np.dot(v-u,b) / (np.linalg.norm(v-u) * np.linalg.norm(b))))/pi
+            # angle between vectors more numerical stable: atan2(norm(cross(a,b)), dot(a,b))
+            X3 = cdist(sym_lt_right_border, sym_rt_left_border,
+                       lambda u, v: np.arctan2(np.linalg.norm(np.cross(v - u, self.b)), np.dot(v - u, self.b))) / np.pi
+            inX3 = X3.copy()
+            # compute partial cost
+            X3 = self.params['lambda_angle'] * (np.exp(X3 / self.params['sigma_angle']) - 1)
+
+            # incorporate IoU
+            X4 = cdist(sym_lt_bboxes, sym_rt_bboxes,
+                       lambda u, v: bb_intersection_over_union(u, v))
+            inX4 = X4.copy()
+            # compute partial cost
+            X4 = self.params['lambda_iou'] * (np.exp(X4 / self.params['sigma_iou']) - 1)
+
+            # sum up cost and insert into dist_mat
+            dist_mat[:X.shape[0], :X.shape[1]] = X + X3 + X4
+
+            # for outlier class set pairwise cost to 0
+            dist_mat[-1, :] = 0
+            dist_mat[:, -1] = 0
+
+            # avoid identity solutions
+            if sym_lt == sym_rt:
+                np.fill_diagonal(dist_mat, self.max_cost)
+
+            # add function and factor
+            func_id = self.gm.addFunction(dist_mat)
+            self.gm.addFactor(func_id, [vidx, vidx + 1])
+
+            # for debugging
+            # self.pairwise_dist_ct[(vidx, vidx + 1)] = inX
+            # self.pairwise_angle_ct[(vidx, vidx + 1)] = inX3
+            # self.pairwise_iou_ct[(vidx, vidx + 1)] = inX4
+            self.pairwise_dist_ct[(vidx, vidx + 1)] = X
+            self.pairwise_angle_ct[(vidx, vidx + 1)] = X3
+            self.pairwise_iou_ct[(vidx, vidx + 1)] = X4
+
+            # in the case of angles add pairwise potentials for all possible combinations
+            if self.params['angle_long_range']:
+                # add combinations on the right of var
+                # not necessary to add combinations on the left of var due to symmetry
+                for vidx_rt in range(vidx + 2, self.num_vars):
+                    sym_rt = self.tl_line_rec.lbl.iat[vidx_rt]
+                    # detections
+                    det_sym_rt = self.region_det[self.region_det[:, 0] == sym_rt]
+                    # cx, cy
+                    sym_rt_left_border = det_sym_rt[:, [1, 2]]
+                    # bboxes
+                    sym_rt_bboxes = det_sym_rt[:, 4:]
+                    # incorporate angle
+                    # angle between vectors more numerical stable: atan2(norm(cross(a,b)), dot(a,b))
+                    XY3 = cdist(sym_lt_right_border, sym_rt_left_border,
+                               lambda u, v: np.arctan2(np.linalg.norm(np.cross(v - u, self.b)),
+                                                       np.dot(v - u, self.b))) / np.pi
+                    # compute partial cost
+                    XY3 = self.params['lr_lambda_angle'] * (np.exp(XY3 / self.params['lr_sigma_angle']) - 1)
+
+                    # incorporate iou
+                    XY4 = cdist(sym_lt_bboxes, sym_rt_bboxes,
+                               lambda u, v: bb_intersection_over_union(u, v))
+
+                    # compute partial cost
+                    XY4 = self.params['lr_lambda_iou'] * (np.exp(XY4 / self.params['lr_sigma_iou']) - 1)
+
+                    # sum up cost and insert into dist_mat
+                    dist_mat[:XY3.shape[0], :XY3.shape[1]] = XY3 + XY4
+
+                    # for outlier class set pairwise cost to 0
+                    dist_mat[-1, :] = 0
+                    dist_mat[:, -1] = 0
+
+                    # avoid identity solutions
+                    if sym_lt == sym_rt:
+                        np.fill_diagonal(dist_mat, self.max_cost)
+
+                    # add function and factor
+                    func_id = self.gm.addFunction(dist_mat)
+                    self.gm.addFactor(func_id, [vidx, vidx_rt])
+
+                    # for debugging
+                    self.pairwise_long_range[(vidx, vidx_rt)] = XY3 + XY4  # XY3, XY4, XY3 + XY4
+
+    def run_inference(self):
+        # only continue, if there is a sign/detection in line to match
+        if len(self.tl_line_rec) > 0 and self.num_relevant > 0:
+
+            if False:
+                # basic belief propagation (slower)
+                bfprop = opengm.inference.BeliefPropagation(gm=self.gm)
+            if True:
+                # TRWS: https://github.com/opengm/opengm/blob/master/src/interfaces/python/opengm/inference/pyTrws.cxx
+                # default params: https://github.com/opengm/opengm/blob/master/src/interfaces/python/opengm/inference/param/trws_external_param.hxx
+                parameter = opengm.InfParam(steps=200)
+                bfprop = opengm.inference.TrwsExternal(gm=self.gm, accumulator='minimizer', parameter=parameter)
+
+            #start = timer()
+            bfprop.infer()
+            #run_time = timer() - start
+            #print('{}'.format(run_time))
+
+            # get and save labeling
+            self.labeling = bfprop.arg()
+            self.tl_line_rec['lbl_arg'] = bfprop.arg()
+
+            # get raw energy and check if inference failed
+            self.raw_energy = self.gm.evaluate(bfprop.arg())
+            self.inference_failed = self.raw_energy > self.num_vars * self.params['outlier_cost']
+
+            # get energy, normalize by num_vars * outlier_cost
+            # worst case should be outliers only
+            max_line_cost = self.num_vars * self.params['outlier_cost']
+            # clip energy, because inference sometimes fails !?
+            self.energy = min(self.raw_energy, max_line_cost) / float(max_line_cost)
+            # attributes cost to individual assignments (selected detections) and normalize using outlier_cost
+            self.tl_line_rec['nE'] = np.around(self.compute_labeling_energy() / self.params['outlier_cost'], decimals=2)
+
+            if self.inference_failed:
+                # all outlier
+                self.tl_line_rec['aligned_det_idx'] = -1
+                self.tl_line_rec['region_det_idx'] = -1
+            else:
+                # compute actual alignments with respect to original detections indices
+                self._compute_global_alignments()
+                # compute alignments with respect to region detection indices
+                self._compute_region_alignments()
+
+    def _compute_global_alignments(self):
+        alignments = np.zeros((len(self.tl_line_rec), 1), dtype=int)
+        for i, lbl in enumerate(self.labeling):
+            if lbl != (self.num_lbls_per_var - 1) and len(self.det_same_label_ct_idx[i]) > 0:
+                alignments[i] = self.det_same_label_ct_idx[i][lbl]
+            else:
+                # outlier
+                alignments[i] = -1
+        # set values in dataframe
+        self.tl_line_rec['aligned_det_idx'] = alignments.astype(int)
+
+    def _compute_region_alignments(self):
+        alignments = np.zeros((len(self.tl_line_rec), 1), dtype=int)
+        for ii, global_det_idx in enumerate(self.tl_line_rec.aligned_det_idx.values):
+            if global_det_idx != -1:
+                # map global to region detection index
+                alignments[ii] = np.where(self.region_det[:, -1] == global_det_idx)[0]
+            else:
+                # outlier detection
+                alignments[ii] = -1
+        # set values in dataframe
+        self.tl_line_rec['region_det_idx'] = alignments.astype(int)
+
+    def get_region_alignments(self):
+        # maybe I should also return the self.tl_line_rec.index
+        # problem arises if self.tl_line_rec is changed inside LineMatching1D
+        if 'region_det_idx' in self.tl_line_rec.columns:
+            return self.tl_line_rec.region_det_idx.values
+        else:
+            return []
+
+    def visualize_matching(self, input_im, sign_hypos, ax=None):
+        # only continue, if there is a sign in line to match
+        if len(self.tl_line_rec) > 0:
+
+            # select detections using alignment index
+            alignments = self.get_region_alignments()
+            aligned = self.region_det[alignments[alignments >= 0], 1:3]
+
+            if ax is None:
+                fig, ax = plt.subplots(figsize=(12, 8))
+            # plot hypo
+            ax.plot(sign_hypos[:, 0], sign_hypos[:, 1], '*b', markersize=10, label='null hypo')
+            ax.plot(aligned[:, 0], aligned[:, 1], 'oy', markersize=8, label='gm aligned detections')
+            # plot tablet
+            ax.imshow(input_im, cmap=plt.cm.Greys_r)
+
+            # annotate
+            for i, pos_idx in enumerate(self.tl_line_rec.iloc[alignments >= 0].pos_idx.values):
+                ax.annotate(pos_idx, (aligned[i, 0], aligned[i, 1]), fontsize=15)
+
+            ax.legend(shadow=True, fancybox=True)
+            ax.axis('off')
+            # plt.show()
+
+    # energy marginal computation
+
+    def _get_unary_cost(self, unary_dict, vidx, didx):
+        unary = unary_dict[vidx]
+        if len(unary) > 0:
+            return unary.flatten()[didx]
+        else:
+            # in cases there inference fails and labeling is out of bounds
+            return self.max_cost
+
+    def _get_pairwise_val(self, pairwise_dict, idx0, idx1):
+        outlier_lbl = self.num_lbls_per_var - 1
+        pairwise = pairwise_dict[idx0, idx1]
+        didx0 = self.labeling[idx0]
+        didx1 = self.labeling[idx1]
+        if didx0 != outlier_lbl and didx1 != outlier_lbl:
+            if pairwise.size > 0:
+                return pairwise[didx0, didx1]
+            else:
+                # in cases there inference fails and labeling is out of bounds
+                return self.max_cost
+        else:
+            return 0
+
+    def _get_pairwise_cost(self, pairwise_dict, vidx):
+        # deal with boundary cases
+        if vidx == self.num_vars - 1:
+            return self._get_pairwise_val(pairwise_dict, vidx - 1, vidx)
+        elif vidx == 0:
+            return self._get_pairwise_val(pairwise_dict, vidx, vidx + 1)
+        else:
+            return (self._get_pairwise_val(pairwise_dict, vidx - 1, vidx)
+                    + self._get_pairwise_val(pairwise_dict, vidx, vidx + 1))
+
+    def _get_lr_pairwise_cost(self, lr_pairwise_dict, vidx):
+        energy = 0
+        for vidx_rt in range(vidx + 2, self.num_vars):
+            energy += self._get_pairwise_val(lr_pairwise_dict, vidx, vidx_rt)
+        return energy
+
+    def compute_unary_cost(self):
+        list_unary = [self.unary_score_ct, self.unary_offset_ct]
+        outlier_lbl = self.num_lbls_per_var - 1
+        u_marginals = np.zeros_like(self.labeling, dtype=np.float)
+        for vidx, didx in enumerate(self.labeling):
+            if didx != outlier_lbl:
+                for unary_dict in list_unary:
+                    u_marginals[vidx] += self._get_unary_cost(unary_dict, vidx, didx)
+            else:
+                u_marginals[vidx] += self.params['outlier_cost']
+
+        return u_marginals
+
+    def compute_pairwise_cost(self):
+        list_pairwise = [self.pairwise_angle_ct, self.pairwise_dist_ct, self.pairwise_iou_ct]
+
+        p_marginals = np.zeros_like(self.labeling, dtype=np.float)
+        if len(self.labeling) > 1:  # only compute if there are any pairs
+            for vidx, dvidx in enumerate(self.labeling):
+                for pairwise_dict in list_pairwise:
+                    p_marginals[vidx] += self._get_pairwise_cost(pairwise_dict, vidx)
+        return p_marginals
+
+    def compute_pairwise_cost_lr(self):
+        lr_pairwise_dict = self.pairwise_long_range
+
+        p_marginals = np.zeros_like(self.labeling, dtype=np.float)
+        if len(self.labeling) > 1:  # only compute if there are any pairs
+            for vidx, dvidx in enumerate(self.labeling):
+                p_marginals[vidx] += self._get_lr_pairwise_cost(lr_pairwise_dict, vidx)
+        return p_marginals
+
+    def compute_labeling_energy(self):
+        # compute an energy vector that attributes cost to individual labels
+        # if the output vector summed up, this equals the un-normalized energy
+
+        # deal with case when inference failed
+        if self.inference_failed:
+
+            return self.max_cost
+        else:
+            u_marginals = self.compute_unary_cost()
+            p_marginals = self.compute_pairwise_cost()
+            plr_marginals = self.compute_pairwise_cost_lr()
+
+            return u_marginals + (p_marginals/2. + plr_marginals)
+
@@ -0,0 +1,464 @@
+import numpy as np
+import pandas as pd
+import math
+
+import matplotlib.pyplot as plt
+
+from operator import itemgetter
+
+from scipy.stats import norm
+from scipy import ndimage as ndi
+from scipy.spatial.distance import pdist, cdist, squareform
+
+from LineFragment import LineFragment
+
+
+# LINES - TRANSLITERATION ALIGNMENT PROBLEM
+# associate lines with transliteration lines
+
+
+# OPTION 0) use line_models sorted by dist as basic alignment
+
+def align_lines_tl_by_sort(line_hypos, tl_df):
+    # use tl_line_indices
+    tl_line_indices = tl_df.line_idx.unique()
+
+    # extend or cut if too short or long respectively
+    diff_len = line_hypos.label.nunique() - len(tl_line_indices)
+    if diff_len > 0:
+        last_idx = tl_line_indices[-1] + 1
+        tl_line_indices = np.concatenate([tl_line_indices, range(last_idx,  last_idx + diff_len)])
+    else:
+        tl_line_indices = tl_line_indices[:line_hypos.label.nunique()]
+
+    # print tl_line_indices, line_hypos.groupby('label').mean().sort_values('dist').index
+
+    # find basic alignment by sorting (enumerate line models sorted according to dist)
+    tl_line_assignment = pd.DataFrame({'tl_line': tl_line_indices,  #  np.arange(line_hypos.label.nunique())
+                                       'hypo_line_lbl': line_hypos.groupby('label').mean().sort_values('dist').index})
+
+    # add tl_line column in line_hypos using join on line_hypos
+    return line_hypos.join(tl_line_assignment.set_index('hypo_line_lbl'), on='label')
+
+
+# OPTION 1) use ground truth annotations as alignment
+# use gt line annotations (implicit update tl_line column in line_hypos)
+# (unreliable, because gt line annotations and transliteration are not necessarily aligned themselves!)
+
+def align_lines_tl_by_ground_truth(line_hypos, tl_df):
+    # update tl_line with gt_line_idx (set nan to -1)
+    #line_hypos['tl_line'] = line_hypos['gt_line_idx'].fillna(-1)
+    line_hypos = line_hypos.assign(tl_line=line_hypos['gt_line_idx'].fillna(-1))
+    # if there are more gt_lines than tl_lines ...
+    # gt_line_idx that are not in tl are replaced with -1
+    not_tl_line_idx = ~line_hypos['tl_line'].isin(tl_df.line_idx.unique())
+    line_hypos.loc[not_tl_line_idx, 'tl_line'] = -1
+    return line_hypos
+
+
+# OPTION 2) adopted from GALE-CHURCH algorithm for sentence alignment
+# relies on line lengths only
+
+norm_logsf = norm.logsf
+LOG2 = math.log(2)
+
+AVERAGE_CHARACTERS = 1
+VARIANCE_CHARACTERS = 6.8
+
+BEAD_COSTS = {(1, 1): 0, (2, 1): 1000,  # (2, 1): 230
+              (1, 2): 1000, (0, 1): 230,
+              (1, 0): 230, (2, 2): 2000}  # (1, 0): 450
+
+# BEAD_COSTS = {(1, 1): 0, (2, 1): 230, (1, 2): 230, (0, 1): 450,
+#                (1, 0): 450, (2, 2): 440}
+
+
+def length_cost(sx, sy, mean_xy, variance_xy):
+    """
+    Code from https://github.com/alvations/gachalign:
+    Calculate length cost given 2 sentence. Lower cost = higher prob.
+
+    The original Gale-Church (1993:pp. 81) paper considers l2/l1 = 1 hence:
+    delta = (l2-l1*c)/math.sqrt(l1*s2)
+
+    If l2/l1 != 1 then the following should be considered:
+    delta = (l2-l1*c)/math.sqrt((l1+l2*c)/2 * s2)
+    substituting c = 1 and c = l2/l1, gives the original cost function.
+    """
+    lx, ly = sum(sx), sum(sy)
+    m = (lx + ly * mean_xy) / 2
+
+    try:
+        delta = (lx - ly * mean_xy) / math.sqrt(m * variance_xy)
+    except ZeroDivisionError:
+        return float('-inf')
+
+    return - 100 * (LOG2 + norm_logsf(abs(delta)))
+
+
+def _align(x, y, mean_xy, variance_xy, bead_costs):
+    """
+    The minimization function to choose the sentence pair with
+    cheapest alignment cost.
+    """
+    m = {}
+    for i in range(len(x) + 1):
+        for j in range(len(y) + 1):
+            if i == j == 0:
+                m[0, 0] = (0, 0, 0)
+            else:
+                m[i, j] = min((m[i - di, j - dj][0] + length_cost(x[i - di:i], y[j - dj:j], mean_xy, variance_xy)
+                               + bead_cost, di, dj)
+                              for (di, dj), bead_cost in BEAD_COSTS.iteritems()
+                              if i - di >= 0 and j - dj >= 0)
+
+    i, j = len(x), len(y)
+    while True:
+        (c, di, dj) = m[i, j]
+        if di == dj == 0:
+            break
+        yield (i - di, i), (j - dj, j)
+        i -= di
+        j -= dj
+
+
+def align_lines_tl_by_gale_church(tl_df, line_hypos, variance_characters=3.0):
+    # updates line_hypos with tl_line idx
+    # actually uses line_hypos_agg
+
+    # get line lengths
+    tl_line_len = tl_df.groupby('line_idx').mean().prior_line_len
+    det_line_len = line_hypos.groupby('label').mean().sort_values('dist').accum
+    # define input
+
+    cx = tl_line_len.values
+    cy = det_line_len.values
+
+    # use detection line lengths to normalize (better range than tl lengths)
+    max_char = int(cy.max())
+
+    # normalize
+    cx /= cx.max()
+    cx *= max_char
+    #cy /= cy.max()
+    #cy *= max_char
+    bc = BEAD_COSTS
+
+    # iterate over aligned pairs
+    for (i1, i2), (j1, j2) in reversed(list(_align(cx, cy, 1.0, variance_characters, bc))):
+        # print (i1, i2), (j1, j2)
+        # print (tl_line_len.index[i1:i2].values, det_line_len.index[j1:j2].values)
+        # check if line_hypo exists
+        if len(det_line_len.index[j1:j2].values) > 0:
+            tl_line_idx = -1
+            if len(tl_line_len.index[i1:i2].values) > 0:
+                tl_line_idx = int(tl_line_len.index[i1:i2].values[0])
+            # assign tl line idx to detected line
+            line_hypos.loc[line_hypos.label.isin(det_line_len.index[j1:j2].values), 'tl_line'] = tl_line_idx
+    # return cx, cy
+    return line_hypos
+
+
+# OPTION 3) adopted from Bleualign algorithm for sentence alignment
+# relies on matching score between tl null hypothesis and sign detections (sign detector)
+# the problem is to align hypo_line_indices (detected lines) with tl_line_indices (transliteration lines)
+# all information required is contained in line fragment
+
+# a) make sure that score_mat forms valid positive weights for edges in graph
+# b) get matching score matrix with shape=[len(hypo_line_indices), len(tl_line_indices)]
+# c) alignment consists of segments that are connected diagonally
+
+
+def compute_bleu_score_mat(hypo_line_indices, tl_line_indices, line_frag):
+    # ransac score
+    score_mat = cdist(hypo_line_indices.reshape(-1, 1), tl_line_indices.reshape(-1, 1),
+                   lambda a_idx, b_idx: line_frag.compute_bleu_score(a_idx.squeeze(), b_idx.squeeze()))  # 5/5, 4/1
+    # score in range [0, 1], but order needs to be reversed
+    score_mat = 1 - score_mat
+    return score_mat
+
+
+def compute_ransac_score_mat(hypo_line_indices, tl_line_indices, line_frag):
+    # ransac score
+    score_mat = cdist(hypo_line_indices.reshape(-1, 1), tl_line_indices.reshape(-1, 1),
+                   lambda a_idx, b_idx: line_frag.compute_ransac_score(a_idx.squeeze(), b_idx.squeeze(),
+                                                                       max_dist_thresh=2, dist_weight=1))  # 5/5, 4/1
+    # score in range [0, 1], but order needs to be reversed
+    score_mat = 1 - score_mat
+    return score_mat
+
+
+def compute_matching_score_mat(hypo_line_indices, tl_line_indices, line_frag):
+    # line matching score
+    score_mat = cdist(hypo_line_indices.reshape(-1, 1), tl_line_indices.reshape(-1, 1),
+                   lambda a_idx, b_idx: line_frag.compute_line_matching_score(a_idx.squeeze(), b_idx.squeeze()))
+
+    # score in range [0, 1], but order needs to be reversed
+    score_mat = 1 - score_mat
+    return score_mat
+
+
+# use this if you want to implement your own similarity score
+def eval_sents_dummy(translist, targetlist, max_alternatives=3):
+    scoredict = {}
+
+    for testID, testSent in enumerate(translist):
+        scores = []
+
+        for refID, refSent in enumerate(targetlist):
+            score = 100 - abs(len(testSent) - len(refSent))  # replace this with your own similarity score
+            if score > 0:
+                scores.append((score, refID, score))
+        # sorted by first item in tuple (i.e. score)
+        scoredict[testID] = sorted(scores, key=itemgetter(0), reverse=True)[:max_alternatives]
+
+    return scoredict
+
+
+# follow the backpointers in score matrix to extract best path of 1-to-1 alignments
+def extract_best_path(pointers):
+
+    i = len(pointers)-1
+    j = len(pointers[0])-1
+    pointer = ''
+    best_path = []
+
+    while i >= 0 and j >= 0:
+        pointer = pointers[i][j]
+        if pointer == '^':
+            i -= 1
+        elif pointer == '<':
+            j -= 1
+        elif pointer == 'match':
+            best_path.append((i, j))
+            i -= 1
+            j -= 1
+
+    best_path.reverse()
+    return best_path
+
+
+# dynamic programming search for best path of alignments (maximal score)
+def pathfinder(translist, targetlist, scoremat):  # scoredict
+
+    # add an extra row/column to the matrix and start filling it from 1,1 (to avoid exceptions for first row/column)
+    matrix = [[0 for column in range(len(targetlist)+1)] for row in range(len(translist)+1)]
+    pointers = [['' for column in range(len(targetlist))] for row in range(len(translist))]
+
+    for i in range(len(translist)):
+        for j in range(len(targetlist)):
+
+            best_score = matrix[i][j+1]
+            best_pointer = '^'
+
+            score = matrix[i+1][j]
+            if score > best_score:
+                best_score = score
+                best_pointer = '<'
+
+            #if np.abs(j - i) < 5:  # distance from diagonal
+            score = scoremat[i, j] + matrix[i][j]
+            if score > best_score:
+                best_score = score
+                best_pointer = 'match'
+
+            matrix[i+1][j+1] = best_score
+            pointers[i][j] = best_pointer
+
+    bleualign = extract_best_path(pointers)
+    return bleualign
+
+
+def align_lines_tl_by_score(line_hypos, line_frag, visualize=True):
+    # alignment based on longest path through score mat (topological sort)
+    assert 'tl_line' in line_hypos.columns, "tl_line needs to be set (e.g. use align by sort"
+
+    # get assignment space (cartesian product of tl_line_indices and hypo_line_indices)
+    hypo_line_indices, tl_line_indices = line_frag.get_alignment_space()
+    # print(hypo_line_indices, tl_line_indices)
+
+    align_opts = [1, 0, 0, 0, 0, 0]  # align + ransac, most accurate, slow  [NORMAL]
+    #align_opts = [0, 0, 1, 0, 0, 0]  # bleu + ransac, a little less accurate, fast  [use with high number of detections]
+    #align_opts = [0, 0, 0, 0, 0, 1]  # bleu
+    #align_opts = [0, 0, 0, 0, 1, 0]  # ransac
+    #align_opts = [0, 0, 0, 1, 0, 0]  # align
+
+    assert(np.sum(align_opts) <= 1)
+
+    # prepare score mats
+    if align_opts[0]:
+        score_mats = [compute_ransac_score_mat(hypo_line_indices, tl_line_indices, line_frag),
+                      compute_matching_score_mat(hypo_line_indices, tl_line_indices, line_frag)]
+        title_strs = ['ransac', 'gm matching']
+        multi_score = True
+    if align_opts[1]:
+        score_mats = [compute_bleu_score_mat(hypo_line_indices, tl_line_indices, line_frag),
+                      compute_matching_score_mat(hypo_line_indices, tl_line_indices, line_frag)]
+        title_strs = ['bleu', 'gm matching']  # bleu
+        multi_score = True
+    if align_opts[2]:
+        score_mats = [compute_ransac_score_mat(hypo_line_indices, tl_line_indices, line_frag),
+                      compute_bleu_score_mat(hypo_line_indices, tl_line_indices, line_frag)]
+        title_strs = ['ransac', 'bleu']
+        multi_score = True
+    if align_opts[3]:
+        score_mats = [compute_matching_score_mat(hypo_line_indices, tl_line_indices, line_frag)]
+        title_strs = ['gm matching']
+        multi_score = False
+    if align_opts[4]:
+        score_mats = [compute_ransac_score_mat(hypo_line_indices, tl_line_indices, line_frag)]
+        title_strs = ['ransac']
+        multi_score = False
+    if align_opts[5]:
+        score_mats = [compute_bleu_score_mat(hypo_line_indices, tl_line_indices, line_frag)]
+        title_strs = ['bleu']
+        multi_score = False
+
+
+    if visualize:
+        # prepare plot
+        fig, axes = plt.subplots(1, 3, figsize=(15, 5))  # 15, 5
+        ax = axes.ravel()
+
+    best_paths = []
+    for i, score_mat in enumerate(score_mats):
+        best_path = pathfinder(hypo_line_indices, tl_line_indices, score_mat)
+        best_paths.append(best_path)
+        path_pts = np.asarray(best_path)
+
+        if visualize:
+            # plot score mats with shortest path
+            if len(path_pts) > 0:
+                ax[i].plot(path_pts[:, 1], path_pts[:, 0])
+                ax[i].plot(path_pts[:, 1], path_pts[:, 0], 'cd')
+            ax[i].imshow(score_mat)
+            ax[i].set_title(title_strs[i])
+
+    # compute joint
+    if multi_score:
+        path_pts = np.asarray(sorted(set(best_paths[0]).intersection(best_paths[1])))
+        # path_pts = np.asarray(best_paths[1])  # use gm_matching only
+    else:
+        path_pts = np.asarray(best_paths[0])
+
+    if visualize:
+        # plot score mats with shortest path
+        if len(path_pts) > 0:
+            ax[2].plot(path_pts[:, 1], path_pts[:, 0])
+            ax[2].plot(path_pts[:, 1], path_pts[:, 0], 'cd')
+        ax[2].imshow((score_mats[0] + score_mats[1]) / 2.)
+        ax[2].set_title('joint')
+
+    if len(path_pts) > 0:
+        # map path through score mat back to line_idx (because score mat idx not necessarily equal score mat idx)
+        # hl_indices = hypo_line_indices[path_pts[:, 0]]  # already equals index to dataframe
+        tl_indices = tl_line_indices[path_pts[:, 1]]
+
+        # print tl_line_assignment, path_pts[:, 0], tl_indices
+
+        # create assignment table for join
+        basic_index = line_frag.line_hypos.tl_line.sort_values().unique()
+        tl_line_assignment = pd.DataFrame({'hypo_tl_line': basic_index, 'tl_line_update': -np.ones_like(basic_index)})
+        tl_line_assignment.loc[path_pts[:, 0], 'tl_line_update'] = tl_indices
+        # join line_hypos on tl_line
+        line_hypos['tl_line'] = line_hypos.join(tl_line_assignment.set_index('hypo_tl_line'), on='tl_line')[
+            'tl_line_update']
+
+    return line_hypos, path_pts
+
+
+####  full pipeline to solve the line-transliteration alignment problem ####
+
+def compute_line_tl_alignment(line_hypos, tl_df, gt_line_assignment, segm_labels, stats, center_im, sign_detections,
+                              visualize=True, align_opt=[False, False, True]):
+
+    path_pts = None
+
+    # BASIC:
+    # use line_models sorted by dist as basic alignment
+    line_hypos = align_lines_tl_by_sort(line_hypos, tl_df)
+
+    # OPTION I:
+    # find basic alignment using line lengths
+    # apply Gale-Church algorithm (implicit update tl_line column in line_hypos)
+    if align_opt[0]:  # False
+        line_hypos = align_lines_tl_by_gale_church(tl_df, line_hypos, variance_characters=6.0)
+
+    # OPTION II:
+    # use gt line annotations (implicit update tl_line column in line_hypos)
+    if align_opt[1]:  # False
+        if len(gt_line_assignment) > 0:
+            line_hypos = align_lines_tl_by_ground_truth(line_hypos, tl_df)
+
+    # OPTION III:
+    # alignment based on longest path through score mat (topological sort)
+    if align_opt[2]:  # True
+        # create line fragment (tl_line should be assigned before!)
+        line_frag = LineFragment(line_hypos, segm_labels, tl_df, stats, center_im, sign_detections)
+        # compute lines tl alignment based on score
+        (line_hypos, path_pts) = align_lines_tl_by_score(line_hypos, line_frag, visualize=visualize)
+
+    return line_hypos, path_pts
+
+
+
+## GT function
+
+
+def gt_align_lines_tl_by_ed(line_gt, visualize=True):
+    # alignment based on longest path through score mat (topological sort)
+
+    # get assignment space (cartesian product of tl_line_indices and gt_line_indices)
+    gt_line_indices, tl_line_indices = line_gt.get_alignment_space()
+
+    # prepare score mats
+    score_mats = [compute_bleu_score_mat(gt_line_indices, tl_line_indices, line_gt)]
+    title_strs = ['edit distance']
+    multi_score = False
+
+    if visualize:
+        # prepare plot
+        fig, axes = plt.subplots(1, 1, figsize=(15, 5), squeeze=False)  # 1,3
+        ax = axes.ravel()
+
+    best_paths = []
+    for i, score_mat in enumerate(score_mats):
+        best_path = pathfinder(gt_line_indices, tl_line_indices, score_mat)
+        best_paths.append(best_path)
+        path_pts = np.asarray(best_path)
+
+        if visualize:
+            # plot score mats with shortest path
+            if len(path_pts) > 0:
+                ax[i].plot(path_pts[:, 1], path_pts[:, 0])
+                ax[i].plot(path_pts[:, 1], path_pts[:, 0], 'cd')
+            ax[i].imshow(score_mat)
+            ax[i].set_title(title_strs[i])
+
+    # compute joint
+    if multi_score:
+        path_pts = np.asarray(sorted(set(best_paths[0]).intersection(best_paths[1])))
+        # path_pts = np.asarray(best_paths[1])  # use gm_matching only
+    else:
+        path_pts = np.asarray(best_paths[0])
+
+    if len(path_pts) > 0:
+        # map path through score mat back to line_idx (because score mat idx not necessarily equal score mat idx)
+        # gt_indices = gt_line_indices[path_pts[:, 0]]  # already equals index to dataframe
+        tl_indices = tl_line_indices[path_pts[:, 1]]
+        # print tl_line_assignment, path_pts[:, 0], tl_indices
+
+        lines_df = line_gt.lines_df
+        # create assignment table for join
+        #basic_index = lines_df.tl_line.sort_values().unique()  # this is not necessary, because gt_line_idx
+        basic_index = lines_df.gt_line_idx.sort_values().unique()
+        tl_line_assignment = pd.DataFrame({'gt_tl_line': basic_index, 'tl_line_update': -np.ones_like(basic_index)})
+        tl_line_assignment.loc[path_pts[:, 0], 'tl_line_update'] = tl_indices
+        # print tl_line_assignment, path_pts
+
+        # join line_hypos on tl_line
+        line_gt.lines_df['tl_line'] = lines_df.join(tl_line_assignment.set_index('gt_tl_line'), on='gt_line_idx')['tl_line_update']
+
+    return line_gt, path_pts
+
+
@@ -0,0 +1,289 @@
+import numpy as np
+import pandas as pd
+import matplotlib.pyplot as plt
+from tqdm import tqdm
+
+from skimage.color import label2rgb
+
+from ..transliteration.TransliterationSet import TransliterationSet
+from ..transliteration.SignsStats import SignsStats
+
+from ..evaluations.sign_tl_evaluation import compute_accuracy
+from ..evaluations.line_tl_evaluation import eval_line_tl_alignment
+from ..evaluations.sign_evaluation_prep import get_pred_boxes_df, get_gt_boxes_df
+from ..evaluations.sign_evaluation_gt import prepare_segment_gt
+from ..evaluations.sign_evaluation import eval_detector_on_collection
+from ..evaluations.sign_evaluator import SignEvalBasic, SignEvalFast
+
+from ..alignment.line_tl_alignment import compute_line_tl_alignment
+from ..alignment.LineFragment import (LineFragment, compute_line_points, compute_line_polygon, plot_boxes)
+
+from ..detection.line_detection import (prepare_transliteration, preprocess_line_input, apply_detector,
+                                        post_process_line_detections, compute_image_label_map)
+from ..detection.detection_helpers import (visualize_net_output, radius_in_image, convert_detections_to_array,
+                                                label_map2image, vis_detections, coord_in_image)
+#from ..detection.tablet_scale_estimation import print_scale_stats
+
+from ..visualizations.line_visuals import (show_hough_transform_w_lines, show_line_segms, show_line_skeleton, show_probabilistic_hough)
+from ..visualizations.line_tl_visuals import show_lines_tl_alignment, show_score_mats_with_paths
+
+
+def gen_alignments(didx_list, dataset, bbox_anno, lines_anno, relative_path, saa_version, re_transform,
+                   sign_model_version, model_fcn, device,
+                   generate_and_save, show_sign_alignments, collection_subfolder, train_data_ext_file, lbl_list,
+                   line_model_version='v007', use_precomp_lines=False, param_dict=None,
+                   show_line_matching=False, verbose=True):
+    """
+    Generate tl-line pairs for seq model training. Store pairs in file.
+    Additionally compute some useful filter criterion for generated pairs.
+    """
+
+    # config tl_line matching
+    # 1: line length
+    # 2: use gt line anno (if available)
+    # 3: shortest path through score matrix
+    align_opt = [False, False, True]
+    visualize_tl_line_matching = show_line_matching
+
+    # setup evaluators
+    use_new_eval = True
+    num_classes = 240
+    eval_ovthresh = 0.5
+    eval_basic = SignEvalBasic(sign_model_version, saa_version, eval_ovthresh)
+    eval_fast = SignEvalFast(sign_model_version, saa_version, tp_thresh=eval_ovthresh, num_classes=num_classes)
+
+    # setup transliteration set
+    tl_set = TransliterationSet(collections=[saa_version], relative_path=relative_path)
+    # setup sign statistics
+    stats = SignsStats(tblSignHeight=128)
+
+    list_pred_boxes_df, list_gt_boxes_df = [], []
+    acc_array = np.zeros(len(didx_list))
+    naligned_array = np.zeros(len(didx_list))
+    for didx in tqdm(didx_list, desc=saa_version):
+        seg_im, seg_idx = dataset[didx]
+        # access meta
+        seg_rec = dataset.assigned_segments_df.loc[seg_idx]
+        image_name, scale, seg_bbox, image_path, view_desc = dataset.get_segment_meta(seg_rec)
+        print(didx, image_name, view_desc)
+
+        # load transliteration dataframe
+        tl_df, num_lines = tl_set.get_tl_df(seg_rec, verbose=verbose)
+        tl_df, num_vis_lines, len_min, len_max = prepare_transliteration(tl_df, num_lines, stats)
+        #print(float(len_min) / len_max, num_vis_lines)
+
+        # boxes file
+        res_name = "{}{}".format(image_name, view_desc)
+        res_path = "{}results/results_ssd/{}/{}".format(relative_path, sign_model_version, saa_version)
+        boxes_file = "{}/{}_all_boxes.npy".format(res_path, res_name)
+
+        # load detections
+        all_boxes = np.load(boxes_file)
+        sign_detections = convert_detections_to_array(all_boxes)
+
+        # load and prepare annotations of segment
+        gt_boxes, gt_labels = prepare_segment_gt(seg_idx, scale, bbox_anno,
+                                                 with_star_crop=False)  # depends on sign_detections!
+        if verbose:
+            print('Load annotations: {} gt bboxes found.'.format(len(gt_boxes)))
+
+        # make seg image is large enough for line detector
+        if seg_im.size[0] > 224 and seg_im.size[1] > 224:
+
+            if use_precomp_lines:
+                # to numpy
+                center_im = np.asarray(seg_im)
+                # lbl_ind
+                line_res_path = "{}results/results_line/{}/{}".format(relative_path, line_model_version, saa_version)
+                lines_file = "{}/{}_lbl_ind.npy".format(line_res_path, res_name)
+                # lines_file = "{}/{}_skeleton.npy".format(line_res_path, res_name)
+                lbl_ind_x = np.load(lines_file).astype(int)
+            else:
+                # prepare input
+                inputs = preprocess_line_input(seg_im, 1, shift=0)
+                center_im = re_transform(inputs[4])  # to pil image
+                center_im = np.asarray(center_im)  # to numpy
+                # apply network
+                #print(inputs.shape)
+                output = apply_detector(inputs, model_fcn, device)
+                # visualize_net_output(center_im, output, cunei_id=1, num_classes=2)
+                # plt.show()
+
+                # prepare output
+                outprob = np.mean(output, axis=0)
+                lbl_ind = np.argmax(outprob, axis=0)
+
+                lbl_ind_x = lbl_ind.copy()
+                lbl_ind_x[np.max(outprob, axis=0) < 0.7] = 0  # line detector dependent (VIP) # outprob.squeeze() # this fixes a bug!
+
+                lbl_ind_80 = lbl_ind.copy()
+                lbl_ind_80[np.max(outprob, axis=0) < 0.8] = 0  # outprob.squeeze() # this fixes a bug!
+
+            # only continue if there is a positive line detection
+            # (avoids unnecessary computation and an error in skimage hough_line_peaks)
+            if np.any(lbl_ind_x):
+
+                # for line detection apply postprocessing pipeline
+                (line_hypos, line_segs, segm_labels, ls_labels, dist_interline_median, group2line,
+                 h, theta, d, skeleton) = post_process_line_detections(lbl_ind_x, num_vis_lines, len_min, len_max, verbose=verbose)
+
+                if len(line_segs) > 0:
+                    # compute overlay
+                    seg_canvas = compute_image_label_map(segm_labels, center_im.shape)
+                    image_label_overlay = label2rgb(seg_canvas, image=center_im)
+
+                # using line annotations: gt_line_idx for hypo_lines
+                gt_line_assignment = lines_anno.get_assignment_for_line_hypos(seg_idx, line_hypos.groupby('label').mean())
+
+                if len(gt_line_assignment) > 0:
+                    # clean join on line_hypos
+                    line_hypos = line_hypos.join(gt_line_assignment.set_index('hypo_line_lbl'), on='label')
+                    ## clean join on line_hypos_agg
+                    # line_frag.line_hypos_agg.join(gt_line_assignment.set_index('hypo_line_lbl'))
+
+                if len(tl_df) > 0:
+
+                    # abort if obvious transliteration / lines mismatch
+                    if np.abs(tl_df.line_idx.nunique() - line_hypos.label.nunique()) > 10:
+                        print("CANCEL segment [{}] : Due to obvious transliteration / lines mismatch".format(seg_idx))
+                        continue
+
+                    #### line-transliteration alignment problem ####
+
+                    line_hypos, path_pts = compute_line_tl_alignment(line_hypos, tl_df, gt_line_assignment,
+                                                                     segm_labels, stats, center_im, sign_detections,
+                                                                     visualize=visualize_tl_line_matching,
+                                                                     align_opt=align_opt)
+
+                    # FINISH lines-tl alignment
+
+                    # create line fragment (tl_line should be assigned before?!)
+                    line_frag = LineFragment(line_hypos, segm_labels, tl_df, stats, center_im, sign_detections)
+                    # get assigned tl indices
+                    assigned_tl_indices = line_frag.get_assigned_lines_idx()
+                    # get assignment space (cartesian product of tl_line_indices and hypo_line_indices)
+                    hypo_line_indices, tl_line_indices = line_frag.get_alignment_space()
+
+                    # evaluate line-tl alignment using gt-line annotations; only quality indicator because unreliable
+                    if len(gt_line_assignment) > 0 and verbose:
+                        eval_line_tl_alignment(line_frag, lines_anno, seg_idx, num_vis_lines)
+
+                # common colormap
+                # color = plt.cm.jet(np.linspace(0,1,len(angles)))
+                cmap = plt.get_cmap('nipy_spectral')
+                color = cmap(np.linspace(0, 1, len(line_hypos)))
+
+                # estimate scale
+                if False:
+                    if len(tl_df) == 0:
+                        # use line detection estimates
+                        num_lines = line_hypos.label.nunique()
+                        len_max = line_hypos.groupby('label').mean().accum.max() / dist_interline_median
+
+                    # get scales using different approaches
+                    # use num_lines for scale estimation (NOT num_vis_lines!)
+                    print_scale_stats(seg_rec, scale, lbl_ind_x, lbl_ind_80, num_lines, len_max,
+                                      line_hypos, dist_interline_median)
+
+                if False:
+                    show_line_skeleton(lbl_ind_x, skeleton)
+                    plt.show()
+
+                if False:
+                    show_hough_transform_w_lines(lbl_ind_x, center_im, h, theta, d, line_hypos, color)
+
+                if len(line_segs) > 0:
+                    if False:
+                        show_probabilistic_hough(lbl_ind_x, center_im, line_segs, ls_labels, group2line, color)
+
+                    if False:
+                        show_line_segms(image_label_overlay, segm_labels)
+
+                if len(tl_df) > 0:
+                    if False:
+                        show_lines_tl_alignment(lbl_ind_x, center_im, line_hypos, color)
+
+                    if False:
+                        show_score_mats_with_paths(assigned_tl_indices, hypo_line_indices, tl_line_indices, line_frag)
+
+                    if True:
+
+                        if show_sign_alignments:
+                            aligned_list, tablet_tl_df = line_frag.tab_visualize_gm_alignments(refined=True)  # refined=True, does not help/hurt
+                        else:
+                            refined = False
+                            if param_dict is not None:
+                                if 'refined' in param_dict:
+                                    refined = param_dict['refined']
+                            aligned_list, tablet_tl_df = line_frag.tab_get_gm_alignments(refined=refined,
+                                                                                         param_dict=param_dict)  # refined=True, does not help/hurt
+                        if len(gt_boxes) > 0:
+
+                            if use_new_eval:
+                                if len(aligned_list) > 0:
+                                    all_boxes = [[el] for el in aligned_list]
+                                    if False:
+                                        # standard mAP eval
+                                        eval_basic.eval_segment(all_boxes, gt_boxes, gt_labels, seg_idx, verbose=verbose)
+                                    # fast evaluation
+                                    eval_fast.eval_segment(all_boxes, gt_boxes, gt_labels, seg_idx, verbose=verbose)
+                                    # get segment statistics of current segment [-1]
+                                    num_tp, num_fp, _, acc, mean_ap, global_ap = eval_fast.get_seg_summary(-1)
+                                    # save acc to array
+                                    acc_array[didx_list.index(didx)] = acc
+                                    # save naligned to array
+                                    naligned_array[didx_list.index(didx)] = num_tp + num_fp
+                            else:
+                                # prepare full collection evaluation
+                                list_pred_boxes_df.append(get_pred_boxes_df([[el] for el in aligned_list], seg_idx))
+                                list_gt_boxes_df.append(get_gt_boxes_df(gt_boxes, gt_labels, seg_idx))
+
+                                # get num aligned across all classes
+                                naligned = np.sum([len(el) for i, el in enumerate(aligned_list) if i > 0])
+
+                                if verbose and len(aligned_list) > 0:
+                                    # [METHOD B]: evaluate mAP and print stats for a single segment
+                                    # (these results can strongly differ from collection-wise evaluation)
+                                    acc, df_stats = compute_accuracy(gt_boxes, gt_labels, aligned_list, return_stats=True)
+                                    # save acc to array
+                                    acc_array[didx_list.index(didx)] = acc
+
+                                    ntfpos = df_stats.tp.sum() + df_stats.fp.sum()
+                                    # print ntfpos, naligned
+
+                                    # save naligned to array
+                                    naligned_array[didx_list.index(didx)] = ntfpos  # naligned
+
+                        if generate_and_save:
+                            line_frag.tab_generate_training_data(collection_subfolder, train_data_ext_file,
+                                                                 image_name, image_path, scale, seg_idx, seg_bbox,
+                                                                 tablet_tl_df, lbl_list, append=True)
+
+            else:
+                print('No lines detected for {}[{}] and thus no alignment performed!'.format(image_name, seg_idx))
+
+        else:
+            print('segment image of for {}[{}] too small!'.format(image_name, seg_idx))
+
+        # make plots appear
+        plt.show()
+
+    # full collection eval
+    acc = 0
+    df_stats = []
+    if use_new_eval:
+        eval_fast.prepare_eval_collection()
+        df_stats, global_ap = eval_fast.eval_collection(verbose=verbose)
+        num_tp, num_fp, num_fp_global, acc = eval_fast.get_col_summary()
+    else:
+        if len(list_gt_boxes_df) > 0:
+            # [METHOD C]: compute mAP across all instances of individual classes
+            # (these results can strongly differ from segment-wise evaluation)
+            gt_boxes_df = pd.concat(list_gt_boxes_df, ignore_index=True)
+            pred_boxes_df = pd.concat(list_pred_boxes_df, ignore_index=True)
+            acc, df_stats = eval_detector_on_collection(gt_boxes_df, pred_boxes_df, ovthresh=None)  # set fixed!
+    return acc, df_stats    # acc_array, naligned_array
+
+
+
+
@@ -0,0 +1,187 @@
+import numpy as np
+import pandas as pd
+import matplotlib.pyplot as plt
+from tqdm import tqdm
+
+from skimage.color import label2rgb
+
+from ..transliteration.TransliterationSet import TransliterationSet
+from ..transliteration.SignsStats import SignsStats
+
+from ..evaluations.sign_evaluation_gt import prepare_segment_gt
+
+from ..alignment.line_tl_alignment import compute_line_tl_alignment
+from ..alignment.LineFragment import LineFragment, plot_boxes
+
+from ..detection.line_detection import prepare_transliteration, post_process_line_detections, compute_image_label_map
+
+from ..utils.bbox_utils import convert_bbox_global2local, box_iou
+from ..utils.nms import nms
+
+
+def convert_sign_rec_to_array(seg_gen_annos, relative_bboxes, scale):
+    """ Maintains data frame index in last column """
+    list_detections = []
+    for anno_idx, anno_rec in seg_gen_annos.iterrows():
+        # [ID, cx, cy, score, x1, y1, x2, y2, idx]
+        temp = np.zeros(9)
+        box = np.array(relative_bboxes[anno_idx]) * scale
+        temp[0] = anno_rec.newLabel
+        temp[1] = (box[2] + box[0]) / 2
+        temp[2] = (box[3] + box[1]) / 2
+        temp[3] = anno_rec.det_score
+        temp[4:8] = box[0:4]
+        temp[8] = anno_idx
+        list_detections.append(temp)
+    # stack
+    detections_arr = np.vstack(list_detections)
+    return detections_arr
+
+
+def gen_cond_hypo_alignments(didx_list, dataset, bbox_anno, lines_anno, anno_df, relative_path, saa_version,
+                             collection_subfolder, train_data_ext_file, lbl_list, generate_and_save,
+                             min_dets_inline=2, ncompl_thresh=20, smooth_y=True, max_dist_det=3,
+                             line_model_version='v007', visualize_hypos=False):
+
+    # setup transliteration set
+    tl_set = TransliterationSet(collections=[saa_version], relative_path=relative_path)
+    # setup sign statistics
+    stats = SignsStats(tblSignHeight=128)
+
+    # for seg_im, seg_idx in dataset:
+    for didx in tqdm(didx_list, desc=saa_version):
+        seg_im, seg_idx = dataset[didx]
+        # access meta
+        seg_rec = dataset.assigned_segments_df.loc[seg_idx]
+        image_name, scale, seg_bbox, image_path, view_desc = dataset.get_segment_meta(seg_rec)
+        res_name = "{}{}".format(image_name, view_desc)
+
+        # load transliteration dataframe
+        tl_df, num_lines = tl_set.get_tl_df(seg_rec, verbose=True)
+
+        if len(tl_df) > 0:  # only continue if transliteration is available
+            tl_df, num_vis_lines, len_min, len_max = prepare_transliteration(tl_df, num_lines, stats)
+            print(float(len_min) / len_max, num_vis_lines)
+
+            # boxes file
+
+            # select generated annos per segment
+            seg_gen_annos = anno_df[anno_df.seg_idx == seg_idx]
+
+            if False:
+                # control completeness filter (redundant - additional filter inside create conditional hypos)
+                filter_nms = False
+                compl_thresh = -1  # 0, 2, 3, 4, 5, 6  disable: -1
+                ncompl_thresh = -1  # 10, 15, 20   disable: -1
+                # filter using nms
+                if filter_nms:
+                    seg_gen_annos = seg_gen_annos[seg_gen_annos.nms_keep]
+                if compl_thresh > -1:
+                    # filter using compl
+                    seg_gen_annos = seg_gen_annos[seg_gen_annos.compl > compl_thresh]
+                if ncompl_thresh > -1:
+                    # filter using compl
+                    seg_gen_annos = seg_gen_annos[seg_gen_annos.ncompl > ncompl_thresh]
+
+            if len(seg_gen_annos) > 0:
+
+                # convert to all boxes
+                relative_bboxes = seg_gen_annos.bbox.apply(lambda x: convert_bbox_global2local(x, list(seg_bbox)))
+                sign_detections = convert_sign_rec_to_array(seg_gen_annos, relative_bboxes, scale)
+
+                # load and prepare annotations of segment
+                gt_boxes, gt_labels = prepare_segment_gt(seg_idx, scale, bbox_anno,
+                                                         with_star_crop=False)  # depends on sign_detections!
+                print('Load annotations: {} gt bboxes found.'.format(len(gt_boxes)))
+
+                # make seg image is large enough for line detector
+                if seg_im.size[0] > 224 and seg_im.size[1] > 224 and len(tl_df) > 0:
+
+                    # prepare input
+                    # to numpy
+                    center_im = np.asarray(seg_im)
+                    # lbl_ind
+                    line_res_path = "{}results/results_line/{}/{}".format(relative_path, line_model_version, saa_version)
+                    lines_file = "{}/{}_lbl_ind.npy".format(line_res_path, res_name)
+                    # lines_file = "{}/{}_skeleton.npy".format(line_res_path, res_name)
+                    lbl_ind_x = np.load(lines_file).astype(int)
+
+                    # only continue if there is a positive line detection
+                    # (avoids unnecessary computation and an error in skimage hough_line_peaks)
+                    if np.any(lbl_ind_x):
+
+                        # for line detection apply postprocessing pipeline
+                        (line_hypos, line_segs, segm_labels, ls_labels, dist_interline_median, group2line,
+                         h, theta, d, skeleton) = post_process_line_detections(lbl_ind_x, num_vis_lines, len_min, len_max)
+
+                        if len(line_segs) > 0:
+                            # compute overlay
+                            seg_canvas = compute_image_label_map(segm_labels, center_im.shape)
+                            image_label_overlay = label2rgb(seg_canvas, image=center_im)
+
+                        # using line annotations: gt_line_idx for hypo_lines
+                        gt_line_assignment = lines_anno.get_assignment_for_line_hypos(seg_idx,
+                                                                                      line_hypos.groupby('label').mean())
+
+                        if len(gt_line_assignment) > 0:
+                            # clean join on line_hypos
+                            line_hypos = line_hypos.join(gt_line_assignment.set_index('hypo_line_lbl'), on='label')
+                            ## clean join on line_hypos_agg
+                            # line_frag.line_hypos_agg.join(gt_line_assignment.set_index('hypo_line_lbl'))
+
+                        if len(tl_df) > 0:
+
+                            # abort if obvious transliteration / lines mismatch
+                            if np.abs(tl_df.line_idx.nunique() - line_hypos.label.nunique()) > 10:
+                                print(
+                                    "CANCEL segment [{}] : Due to obvious transliteration / lines mismatch".format(seg_idx))
+                                continue
+
+                            #### line-transliteration alignment problem ####
+                            # for train use: align_opt=[False, True, False] (use line annos)
+                            line_hypos, path_pts = compute_line_tl_alignment(line_hypos, tl_df, gt_line_assignment,
+                                                                             segm_labels, stats, center_im, sign_detections,
+                                                                             visualize=False,
+                                                                             align_opt=[False, False, True])  # CHANGE HERE
+
+                            # FINISH lines-tl alignment
+
+                            # create line fragment (tl_line should be assigned before?!)
+                            line_frag = LineFragment(line_hypos, segm_labels, tl_df, stats, center_im, sign_detections)
+                            # get assigned tl indices
+                            assigned_tl_indices = line_frag.get_assigned_lines_idx()
+                            # get assignment space (cartesian product of tl_line_indices and hypo_line_indices)
+                            hypo_line_indices, tl_line_indices = line_frag.get_alignment_space()
+
+                            if visualize_hypos:
+                                # generate conditional hypo
+                                (tab_t_hypos, tab_t_anno_idx,
+                                 tab_t_meta) = line_frag.tab_create_conditional_hypo_alignments(anno_df=anno_df,
+                                               min_dets_inline=min_dets_inline, ncompl_thresh=ncompl_thresh,
+                                               smooth_y=smooth_y, max_dist_det=max_dist_det)
+                                if len(tab_t_hypos) > 0:
+                                    if False:
+                                        # filter using nms
+                                        nms_th = 0.6
+                                        keep = nms(tab_t_hypos[:, 4:8], tab_t_hypos[:, 3], threshold=nms_th)
+                                        tab_t_hypos = tab_t_hypos[keep]
+                                    # visualize
+                                    plot_boxes(tab_t_hypos[:, 4:8])
+                                    plt.imshow(line_frag.input_im, cmap='gray')
+
+                            # save to test
+                            if generate_and_save:
+                                line_frag.tab_generate_cond_hypo_training_data(collection_subfolder, train_data_ext_file,
+                                                                               image_name, image_path, scale, seg_idx, seg_bbox,
+                                                                               lbl_list, append=True, anno_df=anno_df,
+                                                                               min_dets_inline=min_dets_inline, ncompl_thresh=ncompl_thresh,
+                                                                               smooth_y=smooth_y, max_dist_det=max_dist_det)
+                    else:
+                        print('No lines detected for {}[{}] and thus no alignment performed!'.format(image_name, seg_idx))
+                else:
+                    print('segment image of for {}[{}] too small!'.format(image_name, seg_idx))
+            else:
+                print('No detections for {}[{}]!'.format(image_name, seg_idx))
+
+            plt.show()
+
@@ -0,0 +1,136 @@
+import numpy as np
+import pandas as pd
+import matplotlib.pyplot as plt
+from tqdm import tqdm
+
+from skimage.color import label2rgb
+
+from ..transliteration.TransliterationSet import TransliterationSet
+from ..transliteration.SignsStats import SignsStats
+
+from ..evaluations.sign_evaluation_gt import prepare_segment_gt
+
+from ..alignment.line_tl_alignment import compute_line_tl_alignment
+from ..alignment.LineFragment import LineFragment, plot_boxes
+
+from ..detection.line_detection import prepare_transliteration, post_process_line_detections, compute_image_label_map
+
+
+def gen_null_hypo_alignments(didx_list, dataset, bbox_anno, lines_anno, relative_path, saa_version,
+                             collection_subfolder, train_data_ext_file, lbl_list, generate_and_save,
+                             line_model_version='v007', visualize_hypos=False):
+
+    # setup transliteration set
+    tl_set = TransliterationSet(collections=[saa_version], relative_path=relative_path)
+    # setup sign statistics
+    stats = SignsStats(tblSignHeight=128)
+
+    # for seg_im, seg_idx in dataset:
+    for didx in tqdm(didx_list, desc=saa_version):
+        seg_im, seg_idx = dataset[didx]
+        # access meta
+        seg_rec = dataset.assigned_segments_df.loc[seg_idx]
+        image_name, scale, seg_bbox, image_path, view_desc = dataset.get_segment_meta(seg_rec)
+        res_name = "{}{}".format(image_name, view_desc)
+
+        # load transliteration dataframe
+        tl_df, num_lines = tl_set.get_tl_df(seg_rec, verbose=True)
+
+        if len(tl_df) > 0:  # only continue if transliteration is available
+            tl_df, num_vis_lines, len_min, len_max = prepare_transliteration(tl_df, num_lines, stats)
+            print(float(len_min) / len_max, num_vis_lines)
+
+            # load and prepare annotations of segment
+            gt_boxes, gt_labels = prepare_segment_gt(seg_idx, scale, bbox_anno,
+                                                     with_star_crop=False)  # depends on sign_detections!
+            print('Load annotations: {} gt bboxes found.'.format(len(gt_boxes)))
+
+            sign_detections = None
+
+            # make seg image is large enough for line detector
+            if seg_im.size[0] > 224 and seg_im.size[1] > 224 and len(tl_df) > 0:
+
+                # prepare input
+                # to numpy
+                center_im = np.asarray(seg_im)
+                # lbl_ind
+                line_res_path = "{}results/results_line/{}/{}".format(relative_path, line_model_version, saa_version)
+                lines_file = "{}/{}_lbl_ind.npy".format(line_res_path, res_name)
+                # lines_file = "{}/{}_skeleton.npy".format(line_res_path, res_name)
+                lbl_ind_x = np.load(lines_file).astype(int)
+
+                # only continue if there is a positive line detection
+                # (avoids unnecessary computation and an error in skimage hough_line_peaks)
+                if np.any(lbl_ind_x):
+
+                    # for line detection apply postprocessing pipeline
+                    (line_hypos, line_segs, segm_labels, ls_labels, dist_interline_median, group2line,
+                     h, theta, d, skeleton) = post_process_line_detections(lbl_ind_x, num_vis_lines, len_min, len_max)
+
+                    if len(line_segs) > 0:
+                        # compute overlay
+                        seg_canvas = compute_image_label_map(segm_labels, center_im.shape)
+                        image_label_overlay = label2rgb(seg_canvas, image=center_im)
+
+                    # using line annotations: gt_line_idx for hypo_lines
+                    gt_line_assignment = lines_anno.get_assignment_for_line_hypos(seg_idx,
+                                                                                  line_hypos.groupby('label').mean())
+
+                    if len(gt_line_assignment) > 0:
+                        # clean join on line_hypos
+                        line_hypos = line_hypos.join(gt_line_assignment.set_index('hypo_line_lbl'), on='label')
+                        ## clean join on line_hypos_agg
+                        # line_frag.line_hypos_agg.join(gt_line_assignment.set_index('hypo_line_lbl'))
+
+                    if len(tl_df) > 0:
+
+                        # abort if obvious transliteration / lines mismatch
+                        if np.abs(tl_df.line_idx.nunique() - line_hypos.label.nunique()) > 10:
+                            print(
+                                "CANCEL segment [{}] : Due to obvious transliteration / lines mismatch".format(seg_idx))
+                            continue
+
+                        #### line-transliteration alignment problem ####
+                        # for train use: align_opt=[False, True, False] (use line annos)
+                        line_hypos, path_pts = compute_line_tl_alignment(line_hypos, tl_df, gt_line_assignment,
+                                                                         segm_labels, stats, center_im, sign_detections,
+                                                                         visualize=False,
+                                                                         align_opt=[True, False, False])  # CHANGE HERE
+
+                        # FINISH lines-tl alignment
+
+                        # create line fragment (tl_line should be assigned before?!)
+                        line_frag = LineFragment(line_hypos, segm_labels, tl_df, stats, center_im, sign_detections)
+                        # get assigned tl indices
+                        assigned_tl_indices = line_frag.get_assigned_lines_idx()
+                        # get assignment space (cartesian product of tl_line_indices and hypo_line_indices)
+                        hypo_line_indices, tl_line_indices = line_frag.get_alignment_space()
+
+                        if visualize_hypos:
+                            # generate conditional hypo
+                            tab_t_hypos = line_frag.tab_create_null_hypo_alignments()
+                            if len(tab_t_hypos) > 0:
+                                if False:
+                                    # filter using nms
+                                    nms_th = 0.6
+                                    keep = nms(tab_t_hypos[:, 4:8], tab_t_hypos[:, 3], threshold=nms_th)
+                                    tab_t_hypos = tab_t_hypos[keep]
+                                # visualize
+                                plot_boxes(tab_t_hypos[:, 4:8])
+                                plt.imshow(line_frag.input_im, cmap='gray')
+
+                        # save to test
+                        if generate_and_save:
+                            line_frag.tab_generate_null_hypo_training_data(collection_subfolder,
+                                                                           train_data_ext_file,
+                                                                           image_name, image_path, scale, seg_idx,
+                                                                           seg_bbox,
+                                                                           lbl_list, append=True)
+                else:
+                    print('No lines detected for {}[{}] and thus no alignment performed!'.format(image_name, seg_idx))
+            else:
+                print('segment image of for {}[{}] too small!'.format(image_name, seg_idx))
+
+            # print plot
+            plt.show()
+
@@ -0,0 +1,7 @@
+### Dataset classes
+
+- `lines_dataset.py`: used for line segmentation training
+- `cunei_dataset.py` : used for sign classification training
+- `cunei_dataset_ssd.py` : used for sign detector training
+- `cunei_dataset_segments.py` : used for evaluation on full tablet image (image + bbox annotations)
+- `segments_dataset.py` : used for evaluation on full tablet image (image only)
@@ -0,0 +1,237 @@
+import pandas as pd
+from future.utils import iteritems
+from tqdm import tqdm
+from ast import literal_eval
+
+from PIL import Image
+import torch.utils.data as data
+
+from ..utils.bbox_utils import *
+from ..utils.transform_utils import crop_pil_image, spatial_sample
+
+DEBUG_MODE = False
+
+
+class CuneiformCollection(data.Dataset):
+
+    def __init__(self, params, transform=None, target_transform=None, relative_path='../', split='train', top_k=-1, top_k_pick=-1, pad_to_square=True):
+
+        self.gray_mean = params['gray_mean']
+        self.context_pad = params['context_pad']
+        if 'test' in split:
+            self.context_pad = 0  # no padding needed
+        self.num_classes = params['num_classes']
+        self.min_align_ratio = 0.6
+        if 'min_align_ratio' in params:
+            self.min_align_ratio = params['min_align_ratio']
+
+        # transforms for data preparation
+        self.transform = transform
+        self.target_transform = target_transform
+        self.pad_to_square = pad_to_square
+
+        self.compl_thresh, self.ncompl_thresh = -1, -1
+        if 'compl_thresh' in params:
+            self.compl_thresh = params['compl_thresh']
+        if 'ncompl_thresh' in params:
+            self.ncompl_thresh = params['ncompl_thresh']
+
+        # load annotations
+        annotation_file = '{}data/annotations/bbox_annotations_{}.csv'.format(relative_path, split)
+        meta_df = pd.read_csv(annotation_file, engine='python')  # read annotation file
+
+        # additional annos (investigate impact of additional train data)
+        if 'train' in split and 'extra_collections' in params:
+            list_annos = [meta_df]
+            for collection in params['extra_collections']:
+                annotation_file = '{}data/annotations/bbox_annotations_{}.csv'.format(relative_path, collection)
+                anno_df = pd.read_csv(annotation_file, engine='python')  # read annotation file
+                list_annos.append(anno_df)
+            meta_df = pd.concat(list_annos, ignore_index=True)
+
+        # add missing columns to meta_df
+        nd_bbox = np.array(meta_df['bbox'].apply(literal_eval).tolist())  # convert to ndarray
+        meta_df['x1'] = nd_bbox[:, 0]
+        meta_df['y1'] = nd_bbox[:, 1]
+        meta_df['x2'] = nd_bbox[:, 2]
+        meta_df['y2'] = nd_bbox[:, 3]
+        meta_df['imageName'] = meta_df['tablet_CDLI'] + '.jpg'
+        meta_df['image_path'] = '{}data/images/'.format(relative_path) + meta_df['collection'] \
+                                + '/' + meta_df['imageName']
+
+        ### load and prepare gen_df
+        # append with gen alignments
+        gen_cols = ['imageName', 'folder', 'image_path', 'label', 'train_label',
+                    'x1', 'y1', 'x2', 'y2', 'width', 'height', 'segm_idx',
+                    'line_idx', 'pos_idx', 'det_score', 'm_score', 'align_ratio', 'nms_keep', 'compl', 'ncompl']
+
+        # segm_idx,tablet_CDLI,view_desc,collection,mzl_label,train_label,bbox,relative_bbox
+
+        collections_ext = [split]
+
+        if 'train' in split:
+            # OPT I : use csv file that contains list of generated boxes
+            if 'gen_file' in params:
+                gen_df = pd.read_csv(params['gen_file'], engine='python', header=None,  delimiter=', ', names=gen_cols)  # delimiter might need to be removed?!
+            # OPT II : load csv files for collection specific collections and concatenate
+            elif 'gen_collections' in params:
+                assert params['gen_folder'] is not None, 'When using gen_collections, user needs to provide gen_model!'
+                df_list = []
+                for gen_coll in params['gen_collections']:
+                    gen_file_path = "{}results/{}line_generated_bboxes_refined80_{}.csv".format(relative_path,
+                                                                                                params['gen_folder'], gen_coll)
+                    gen_df = pd.read_csv(gen_file_path, delimiter=',\s*', engine='python', header=None, names=gen_cols)   # delimiter=', ', delimiter=',\s*',
+                    df_list.append(gen_df)
+                gen_df = pd.concat(df_list, ignore_index=True)
+            # prepare gen_df
+            if ('gen_file' in params) or ('gen_collections' in params):
+
+                # IMPORTANT: filter gen data according to align ratio
+                gen_df = gen_df[gen_df.align_ratio > self.min_align_ratio]
+
+                # IMPORTANT: fill nan values in a way that avoids filtering
+                gen_df.compl = gen_df.compl.fillna(50)
+                gen_df.ncompl = gen_df.ncompl.fillna(100)
+
+                num_before_filter = len(gen_df)
+                if self.compl_thresh > -1:
+                    # filter using compl
+                    gen_df = gen_df[gen_df.compl > self.compl_thresh]  # 0, 2, 4, 5
+                    print('Completeness {} :: Removed {} samples. [{}]'.format(self.compl_thresh,
+                                                                               num_before_filter - len(gen_df),
+                                                                               len(gen_df)))
+                elif self.ncompl_thresh > -1:
+                    # filter using compl
+                    gen_df = gen_df[gen_df.ncompl > self.ncompl_thresh]  # 0, 2, 4, 5
+                    print('Completeness (norm.) {} :: Removed {} samples. [{}]'.format(self.ncompl_thresh,
+                                                                                       num_before_filter - len(gen_df),
+                                                                                       len(gen_df)))
+                print('class sample count stats: ')
+                print(gen_df.train_label.value_counts().describe())
+
+                # add/update additional columns
+                gen_df['collection'] = gen_df.folder.str.split('/').str[0]
+                gen_df['generated'] = True
+                gen_df['imageName'] = gen_df['imageName'].astype(str) + '.jpg'
+
+                # identify all collections with generated annotations
+                list_gen_collection = gen_df.collection.unique().tolist()
+                collections_ext += list_gen_collection
+
+                # concatenate
+                meta_df = pd.concat([meta_df, gen_df], ignore_index=True)
+
+        # drop outlier classes for now (dirty fix)
+        class_outlier_select = meta_df.train_label < 240
+        if np.any(class_outlier_select):
+            print('Drop {} outlier samples!'.format(np.sum(~class_outlier_select)))
+            meta_df = meta_df[class_outlier_select]
+
+        # reset index
+        self.meta_df = meta_df.reset_index(drop=True)
+
+        # make sure there is width and height
+        self.meta_df['width'] = self.meta_df['x2'] - self.meta_df['x1'] + 1
+        self.meta_df['height'] = self.meta_df['y2'] - self.meta_df['y1'] + 1
+
+        # only keep top 100 classes
+        if top_k > 0:
+            top_labels = self.meta_df.label.value_counts()[:top_k].index.values
+            top_select = self.meta_df.label.isin(top_labels)
+            self.meta_df = self.meta_df[top_select].reset_index()
+            if top_k > top_k_pick >= 0:
+                print(top_labels)
+                print('Only select samples from class {}'.format(top_labels[top_k_pick]))
+                class_select = self.meta_df.label == top_labels[top_k_pick]
+                self.meta_df = self.meta_df[class_select].reset_index(drop=True)
+
+        # all annotations are used
+        self.osd_valid_ind = self.meta_df.index
+
+        # crop pre-processing
+        # save longest side of each sign
+        self.meta_df['square'] = self.meta_df[['width', 'height']].max(axis=1)
+        # for each tablet compute median of longest side, and assign it to each sign
+        median_table = self.meta_df[self.meta_df.train_label > 0].groupby('imageName')[['square']].median()
+        self.meta_df = self.meta_df.join(median_table, on='imageName', rsuffix='_md')
+        # self.meta_df['square_new'] = self.meta_df[['square', 'square_md']].max(axis=1)
+
+        # pre-load all images
+        self.use_preload = True
+        if self.use_preload:
+            map = {key: value for (key, value) in enumerate(self.meta_df['image_path'][self.osd_valid_ind].unique())}
+            inv_map = {value: key for key, value in iteritems(map)}  # use items
+            self.meta_df['mem_idx'] = self.meta_df['image_path'].replace(inv_map)
+            self.image_data_list = []
+            for key, impath in tqdm(iteritems(map), total=len(map)):
+                im_ref = None
+                try:
+                    im_ref = Image.open(impath)
+                except IOError:
+                    print('could not read image: {}'.format(impath))
+                # due to memory constraints not .convert('RGB')
+                im_ref = im_ref.convert('L')
+                self.image_data_list.append(im_ref)
+
+        # setup finished
+        print("Setup {} dataset spanning {} collections.".format(split, collections_ext))
+        num_segs = self.meta_df['image_path'].nunique()
+        print("Select {} bboxes from {} tablets.".format(len(self), num_segs))
+
+    def __getitem__(self, index):
+
+        # map index to csv index
+        csv_idx = self.osd_valid_ind[index]
+
+        impath = self.meta_df.iloc[csv_idx]['image_path']
+        target = self.meta_df.iloc[csv_idx]['train_label']
+
+        square = self.meta_df.iloc[csv_idx]['square']
+        square_md = self.meta_df.iloc[csv_idx]['square_md']
+
+        # load image data
+        if self.use_preload:
+            mem_idx = self.meta_df.iloc[csv_idx]['mem_idx']
+            im_ref = self.image_data_list[mem_idx]
+        else:
+            im_ref = None
+            try:
+                im_ref = Image.open(impath).convert('L')  # due to memory constraints not .convert('RGB')
+            except IOError:
+                print('could not read image: {}'.format(impath))
+
+        # bounding box meta
+        bb = [self.meta_df.iloc[csv_idx]['x1'], self.meta_df.iloc[csv_idx]['y1'],
+              self.meta_df.iloc[csv_idx]['x2'], self.meta_df.iloc[csv_idx]['y2']]
+
+        # context crop
+        context_pad = self.context_pad  # int(square * self.context_pad)  #
+
+        if self.pad_to_square:
+            # if background, context_pad = 0
+            if target == 0:
+                context_pad = 0
+            # if largest side of bbox is smaller than median of tablet, add additional context pad
+            elif square_md > square:
+                context_pad += (square_md - square) / 2.  # divide by 2, because w,h of im_pad grow by 2 * context_pad
+
+        # new fast
+        im, bb_pad = crop_pil_image(im_ref, bb, context_pad=context_pad, pad_to_square=self.pad_to_square)
+
+        # apply augmentation pipeline and convert from PIL to numpy
+        if self.transform is not None:
+            im = self.transform(im)
+
+        if self.target_transform is not None:
+            target = self.target_transform(target)
+
+        return im, target
+
+    def __len__(self):
+        return len(self.osd_valid_ind)
+
+
+def test(params, split='train', top_k=-1, top_k_pick=-1, pad_to_square=True, relative_path='../../'):
+    dataset = CuneiformCollection(params, relative_path=relative_path, split=split, top_k=top_k, top_k_pick=top_k_pick, pad_to_square=pad_to_square)
+
+    return dataset
@@ -0,0 +1,334 @@
+import torch
+import numpy as np
+import pandas as pd
+from PIL import Image
+from ast import literal_eval
+import os.path
+from tqdm import tqdm
+
+import torch.utils.data as data
+
+from ..detection.sign_detection import crop_segment_from_tablet_im
+
+# from utils.cython_bbox import bbox_overlaps
+from ..utils.bbox_utils import clip_boxes
+from ..utils.torchcv.transforms.crop_box import crop_box
+from ..utils.torchcv.transforms.resize import resize
+
+
+# helper functions
+
+def convert_bbox_global2local(gbbox, seg_bbox):
+    x, y = seg_bbox[:2]
+    relative_bbox = np.array(gbbox) - np.array([x, y, x, y])
+    return relative_bbox.tolist()
+
+
+def get_segment_meta(segment_rec):
+    image_name = segment_rec.tablet_CDLI
+
+    # this should control which scale is used in consecutive processing
+    scale = segment_rec.scale  #* self.rescale
+
+    seg_bbox = segment_rec.bbox
+    path_to_image = segment_rec.im_path
+    view_desc = "{}".format(segment_rec.view_desc).replace("nan", "")
+
+    return image_name, scale, seg_bbox, path_to_image, view_desc
+
+
+def bbox_ctr_overlaps(boxes1, boxes2):
+    # check for all combinations of boxes1 and boxes2 if ctrs of boxes2 are in boxes1
+    overlaps_mat = np.zeros([boxes1.shape[0], boxes2.shape[0]])
+    for ii, box in enumerate(boxes1):
+        x, y, x2, y2 = box
+        # check if center is still inside tile_box, otherwise ignore box
+        # if center is not inside tile box,
+        # not possible to get IoU >= 0.5 --> treated as background anyways
+        center = (boxes2[:, :2] + boxes2[:, 2:]) / 2
+        mask = (center[:, 0] >= x) & (center[:, 0] <= x2) \
+               & (center[:, 1] >= y) & (center[:, 1] <= y2)
+        overlaps_mat[ii, :] = mask
+    return overlaps_mat
+
+
+# Cuneiform SSD dataset
+
+
+class CuneiformSegments(data.Dataset):
+
+    def __init__(self, collections=['train'], transform=None, relative_path='../',
+                 only_annotated=True, only_assigned=True, preload_segments=True, use_gray_scale=True):
+
+        # merge multiple data sources in order to provide following function:
+        #  f(idx) -> image, boxes, labels
+        # uses gt annotations only
+        # if no annotations available boxes and labels are empty lists
+
+        # transforms for data preparation
+        self.transform = transform
+        self.preload_segments = preload_segments
+        self.use_gray_scale = use_gray_scale
+
+        ### load and prepare list_sign_anno_df
+        # manual annotation files may be based on multiple collections
+        # for each collection
+        # store in list_sign_anno_df
+
+        # load bbox annotations
+        list_anno_collections = []
+        sign_anno_df_list = []
+        for collection in collections:
+            # load sign annotations
+            annotation_file = '{}data/annotations/bbox_annotations_{}.csv'.format(relative_path, collection)
+            # ATTENTION: only use gt annotations if collection is provided in collections parameter
+            if os.path.exists(annotation_file):
+                sign_anno_df = pd.read_csv(annotation_file, engine='python')  # read annotation file
+                # add additional columns
+                sign_anno_df['generated'] = False
+                sign_anno_df['global_segm_idx'] = -1
+                sign_anno_df['relative_bbox'] = sign_anno_df['relative_bbox'].apply(literal_eval)
+                sign_anno_df['relative_bbox'] = sign_anno_df['relative_bbox'].apply(np.array)  # convert to ndarray
+
+                # slice sign_anno_df if there are multiple different collections contained
+                for sub_collection in sign_anno_df.collection.unique():
+                    # store collection name
+                    list_anno_collections.append(sub_collection)
+                    # store collection specific slice of data frame
+                    sub_sign_anno_df = sign_anno_df[sign_anno_df.collection == sub_collection]
+                    sign_anno_df_list.append(sub_sign_anno_df)
+
+        ### extend collections
+        # create list of elementary collections
+        collections_ext = np.unique(list_anno_collections).tolist()
+
+        ###################
+        # II) on collection level: load annotations and meta data
+
+        ### load segment, sign meta information
+        # for each collection
+        # store in segments_df_list
+
+        # reduced set of columns - only keep what is needed and maintained
+        segments_df_columns = ['tablet_CDLI', 'view_desc', 'bbox', 'collection', 'scale', 'im_path']
+
+        segments_df_list = []
+        #sign_anno_df_list = []
+        for collection in collections_ext:
+
+            # load segment metadata
+            annotation_file = '{}data/segments/tablet_segments_{}.csv'.format(relative_path, collection)
+            tablet_segments_df = pd.read_csv(annotation_file, engine='python', index_col=0)
+            # convert string of list to list
+            tablet_segments_df['bbox'] = tablet_segments_df['bbox'].apply(literal_eval)
+            tablet_segments_df['bbox'] = tablet_segments_df['bbox'].apply(np.array)  # convert to ndarray
+            # add collection column
+            file_names = tablet_segments_df['tablet_CDLI'] + '.jpg'
+            tablet_segments_df['im_path'] = '{}data/images/'.format(relative_path) + tablet_segments_df['collection'] + '/' + file_names
+            # get assigned segment (can be edited from outside without harm)
+            if only_assigned:
+                assigned_segments_df = tablet_segments_df[(tablet_segments_df.assigned == True)]
+            else:
+                assigned_segments_df = tablet_segments_df
+
+            # collect data frames in lists
+            segments_df_list.append(assigned_segments_df[segments_df_columns])
+
+
+        ### assemble ssd_segments_df with new index
+        # search all segments with annotations
+
+        list_segments_df_anno = []
+        for collection in collections_ext:
+            coll_idx = collections_ext.index(collection)
+
+            list_segm_indices = []
+            # get all segment indices for this collection that contain annotations
+            if only_annotated:
+                # if there are gt annotations
+                if collection in list_anno_collections:
+                    anno_coll_idx = list_anno_collections.index(collection)
+                    # if there are gt annotations
+                    if len(sign_anno_df_list[anno_coll_idx]) > 0:
+                        # load their indices
+                        segm_indices_anno = sign_anno_df_list[anno_coll_idx].segm_idx.unique()
+                        # filter annotations without assigned segment
+                        segm_indices_anno = segm_indices_anno[segm_indices_anno >= 0]
+                        list_segm_indices.append(segm_indices_anno)
+                # append only segments with anno
+                if len(list_segm_indices) > 0:
+                    # stack to obtain list of segment indices with annotations
+                    segm_indices = np.unique(np.hstack(list_segm_indices))
+                    # append
+                    list_segments_df_anno.append(segments_df_list[coll_idx].loc[segm_indices])
+            else:
+                # append all segments from collection
+                list_segments_df_anno.append(segments_df_list[coll_idx])
+
+        # create new datasets ssd_segment_df
+        # concat dataframes and use reset_index to create column with old indices
+        ssd_segments_df = pd.concat(list_segments_df_anno).reset_index()
+
+        # rename column to segm_idx
+        ssd_segments_df.columns.values[0] = 'segm_idx'
+
+
+        ###################
+        # III) on segment level: load data and prepare dataset index
+
+        ### assemble ssd_sign_anno_df and update ssd_segments_df
+        # additional column for ssd_sign_anno_df: global_segm_idx
+        # additional column for ssd_segments_df: with num_anno
+
+        sign_anno_df_cols = ['tablet_CDLI', 'mzl_label', 'train_label', 'segm_idx', 'collection',
+                             'generated', 'relative_bbox', 'global_segm_idx']
+        # segm_idx,tablet_CDLI,view_desc,collection,mzl_label,train_label,bbox,relative_bbox
+        list_ssd_sign_anno_df = []
+
+        list_lines_annotated_per_segm = np.zeros(len(ssd_segments_df), dtype=bool)
+        list_num_anno_per_segm = np.zeros(len(ssd_segments_df), dtype=int)
+
+        # iterate over segments
+        for global_seg_idx, seg_rec in ssd_segments_df.iterrows():
+            image_name, scale, seg_bbox, image_path, view_desc = get_segment_meta(seg_rec)
+            res_name = "{}{}".format(image_name, view_desc)
+            segm_idx = seg_rec.segm_idx
+            collection = seg_rec.collection
+            # print(image_name, view_desc, segm_idx)
+            coll_idx = collections_ext.index(collection)
+
+            ### if annotations available for segment, append to list
+            if collection in list_anno_collections:
+                anno_coll_idx = list_anno_collections.index(collection)
+                if len(sign_anno_df_list[anno_coll_idx]) > 0:
+                    sign_anno_df = sign_anno_df_list[anno_coll_idx]
+                    # select sign annos for segment
+                    segm_select = sign_anno_df.segm_idx == segm_idx
+                    if len(sign_anno_df[segm_select]) > 0:
+                        # update data frame column
+                        sign_anno_df.loc[segm_select, 'global_segm_idx'] = global_seg_idx
+                        # collect information
+                        sign_anno_seg = sign_anno_df[segm_select]
+                        list_num_anno_per_segm[global_seg_idx] = len(sign_anno_seg)
+                        list_ssd_sign_anno_df.append(sign_anno_seg[sign_anno_df_cols])
+
+        # add columns to ssd_segments_df
+        ssd_segments_df['num_anno'] = np.array(list_num_anno_per_segm)
+
+        if len(list_ssd_sign_anno_df) > 0:
+            # assemble ssd_sign_anno_df (drop old index)
+            ssd_sign_anno_df = pd.concat(list_ssd_sign_anno_df, ignore_index=True)
+        else:
+            # create empty data frame with correct columns
+            ssd_sign_anno_df = pd.DataFrame(columns=sign_anno_df_cols)
+
+        ###################
+        # IV) Preload: line detections and segment images
+
+        ### preload segment images
+        # crop segment and convert to gray scale
+        # IMPORTANT: preload segment crops (without scaling, because memory)
+
+        image_data_list = []
+        if self.preload_segments:
+            # iterate over segments
+            for global_seg_idx, seg_rec in tqdm(ssd_segments_df.iterrows(), total=len(ssd_segments_df)):
+                image_name, scale, seg_bbox, image_path, view_desc = get_segment_meta(seg_rec)
+                res_name = "{}{}".format(image_name, view_desc)
+
+                # load composite image
+                pil_im = Image.open(image_path)
+                # crop segment
+                tablet_seg, new_bbox = crop_segment_from_tablet_im(pil_im, seg_bbox)
+                # convert to gray scale and store in list
+                if self.use_gray_scale:
+                    # convert to gray scale
+                    image_data_list.append(tablet_seg.convert('L'))
+                else:
+                    image_data_list.append(tablet_seg)
+
+
+        ###################
+        # VI) Dataset index
+
+        sample2tile_list = ssd_segments_df.index.values
+
+        ###################
+        # attach resulting data structures to class
+        self.collections = collections
+        self.collections_ext = collections_ext
+
+        self.ssd_segments_df = ssd_segments_df
+        self.ssd_sign_anno_df = ssd_sign_anno_df
+
+        self.image_data_list = image_data_list
+
+        # self.sign_anno_df_list = sign_anno_df_list
+        # self.segments_df_list = segments_df_list
+
+        self.sample2tile_list = sample2tile_list
+
+        # map from seg idx to dataset idx
+        self.sidx2didx = dict(zip(ssd_segments_df.segm_idx.values, range(len(ssd_segments_df))))
+
+        # setup finished
+        print("Setup dataset spanning {} collections with {} annotations [{} segments, {} indices]".format(
+            len(collections_ext), len(ssd_sign_anno_df), len(ssd_segments_df),  len(sample2tile_list)))
+
+    def __getitem__(self, index):
+        # get segment
+        global_seg_idx = self.sample2tile_list[index]
+        seg_rec = self.ssd_segments_df.loc[global_seg_idx]
+
+        # load segment meta data
+        image_name, scale, seg_bbox, image_path, view_desc = get_segment_meta(seg_rec)
+
+        # get sign annos
+        select_segm = self.ssd_sign_anno_df.global_segm_idx == global_seg_idx
+        segm_annos = self.ssd_sign_anno_df[select_segm]
+
+        # get annotated boxes and their labels
+        if len(segm_annos) > 0:
+            seg_boxes = np.stack(segm_annos.relative_bbox)
+            labels = segm_annos.train_label.values
+            # convert to torch tensors
+            seg_boxes = torch.from_numpy(seg_boxes).float()
+            labels = torch.from_numpy(labels)
+        else:
+            seg_boxes = None
+            labels = None
+
+        # get segment image
+        if self.preload_segments:
+            pil_im = self.image_data_list[global_seg_idx]
+        else:
+            # load composite image
+            pil_im = Image.open(image_path)
+            # crop segment
+            tablet_seg, new_bbox = crop_segment_from_tablet_im(pil_im, seg_bbox)
+            if self.use_gray_scale:
+                # convert to gray scale
+                pil_im = tablet_seg.convert('L')
+            else:
+                pil_im = tablet_seg
+
+        # tensor functions adapted from kuangliu's code
+        # https://github.com/kuangliu/torchcv/tree/master/torchcv/transforms
+
+        # scale segment
+        im, boxes = resize(pil_im, seg_boxes, None, scale=scale)
+
+        # apply augmentation pipeline and convert from PIL to numpy
+        if self.transform is not None:
+            im, boxes, labels = self.transform(im, boxes, labels)
+
+        return im, boxes, labels
+
+    def get_seg_rec(self, index):
+        # get segment
+        global_seg_idx = self.sample2tile_list[index]
+        return self.ssd_segments_df.loc[global_seg_idx]
+
+    def __len__(self):
+        return len(self.sample2tile_list)
+
@@ -0,0 +1,696 @@
+import numpy as np
+import pandas as pd
+from PIL import Image
+from ast import literal_eval
+import os.path
+from tqdm import tqdm
+
+import torch.utils.data as data
+
+from ..detection.sign_detection import *
+
+# from utils.cython_bbox import bbox_overlaps
+from ..utils.bbox_utils import clip_boxes
+from ..utils.transform_utils import convert2binaryPIL
+from ..utils.torchcv.transforms.crop_box import crop_box
+from ..utils.torchcv.transforms.resize import resize
+from ..utils.torchcv.transforms_lm.crop_box import crop_box_lm
+from ..utils.torchcv.transforms_lm.resize import resize_lm
+
+from .lines_dataset import collect_line_coords, create_line_trafo
+
+from ..detection.line_detection import compute_image_label_map
+
+
+# helper functions
+
+
+def convert_bbox_global2local(gbbox, seg_bbox):
+    x, y = seg_bbox[:2]
+    relative_bbox = np.array(gbbox) - np.array([x, y, x, y])
+    return relative_bbox.tolist()
+
+
+def get_segment_meta(segment_rec):
+    image_name = segment_rec.tablet_CDLI
+
+    # this should control which scale is used in consecutive processing
+    scale = segment_rec.scale  #* self.rescale
+
+    seg_bbox = segment_rec.bbox
+    path_to_image = segment_rec.im_path
+    view_desc = "{}".format(segment_rec.view_desc).replace("nan", "")
+
+    return image_name, scale, seg_bbox, path_to_image, view_desc
+
+
+def compute_tiles(imw, imh, scale, tile_shape=[600, 600], border_sz=100, w_step_sz=300, h_step_sz=400):
+    # TODO: improve using linespace and allow overlap to vary
+
+    # signs height should be around 130px, however, length can be up to 300px
+    # -> overlap along lines (300px) should be larger than between lines (200px)
+    # -> this means for step sizes: w_step_sz < h_step_sz
+    inv_scale = 1. / scale
+    tile_shape = np.array(tile_shape) * inv_scale
+    border_sz *= inv_scale
+    w_step_sz *= inv_scale
+    h_step_sz *= inv_scale
+
+    tile_ol_w = tile_shape[0] - w_step_sz
+    tile_ol_h = tile_shape[0] - h_step_sz
+    w_list = np.arange(border_sz, imw - border_sz - tile_ol_w, step=w_step_sz)
+    h_list = np.arange(border_sz, imh - border_sz - tile_ol_h, step=h_step_sz)
+
+    # grid pts represent upper left corner of tile box
+    # tiles can be larger than image and need to be padded
+    XX, YY = np.meshgrid(w_list, h_list)
+
+    # compute bboxes
+    ul_corner = np.rint(np.stack([XX.ravel(), YY.ravel()], axis=1)).astype(int)
+    lr_corner = ul_corner + np.rint(tile_shape)
+    bboxes = np.hstack([ul_corner, lr_corner])
+    # make sure tiles inside image boundaries
+    bboxes = clip_boxes(bboxes, [imh, imw])  # [imh, imw] is correct order for this function
+
+    return bboxes, XX, YY
+
+
+def bbox_ctr_overlaps(boxes1, boxes2):
+    # check for all combinations of boxes1 and boxes2 if ctrs of boxes2 are in boxes1
+    overlaps_mat = np.zeros([boxes1.shape[0], boxes2.shape[0]])
+    for ii, box in enumerate(boxes1):
+        x, y, x2, y2 = box
+        # check if center is still inside tile_box, otherwise ignore box
+        # if center is not inside tile box,
+        # not possible to get IoU >= 0.5 --> treated as background anyways
+        center = (boxes2[:, :2] + boxes2[:, 2:]) / 2
+        mask = (center[:, 0] >= x) & (center[:, 0] <= x2) \
+               & (center[:, 1] >= y) & (center[:, 1] <= y2)
+        overlaps_mat[ii, :] = mask
+    return overlaps_mat
+
+
+# Cuneiform SSD dataset
+
+
+class CuneiformSSD(data.Dataset):
+
+    def __init__(self, collections=['train'], gen_file_path=None, gen_collections=[], gen_folder=None, transform=None,
+                 relative_path='../', use_balanced_idx=True, tile_shape=[600, 600], use_linemaps=False,
+                 remove_empty_tiles=False, min_align_ratio=0.6, filter_nms=False, compl_thresh=-1, ncompl_thresh=-1,
+                 num_top_ncompl=0, min_ncompl_thresh=10):
+
+        # merge multiple data sources in order to form a single dataset that can be used for SSD style detector training
+        # provides following function:
+        #  f(idx) -> image, bboxes, labels
+        # or more general:
+        #  f(idx) -> image, bboxes, labels, line_map
+
+        # join multiple levels of supervision: three cases for sign annotations
+        # 1) tablets completely annotated (no need to load line annotations nor line detections)
+        # 2) tablets partly annotated and line annotations available (no need to load line detections)
+        # 3) tablets partly annotated and line detections required
+
+        # transforms for data preparation
+        self.transform = transform
+        self.line_model_version = None
+        self.use_linemaps = use_linemaps
+        self.min_align_ratio = min_align_ratio
+        self.filter_nms = filter_nms
+        self.compl_thresh = compl_thresh
+        self.ncompl_thresh = ncompl_thresh
+        self.num_top_ncompl = num_top_ncompl
+        self.min_ncompl_thresh = min_ncompl_thresh
+
+        line_model_version = 'v007'
+        num_classes = 240
+
+        ###################
+        # I) load generated and manual annotations
+
+        ### load and prepare gen_df
+        # generated annotations may be based on multiple collections
+
+        gen_cols = ['imageName', 'folder', 'image_path', 'label', 'train_label',
+                    'x1', 'y1', 'x2', 'y2', 'width', 'height', 'segm_idx',
+                    'line_idx', 'pos_idx', 'det_score', 'm_score', 'align_ratio', 'nms_keep', 'compl', 'ncompl']
+
+        # OPT I : use csv file that contains list of generated boxes
+        if gen_file_path:
+            gen_file_path = "{}results{}".format(relative_path, gen_file_path)
+            gen_df = pd.read_csv(gen_file_path, engine='python', header=None, names=gen_cols)
+        # OPT II : load csv files for collection specific collections and concatenate
+        elif len(gen_collections) > 0:
+            assert gen_folder is not None, 'When using gen_collections, user needs to provide gen_model!'
+            df_list = []
+            for gen_coll in gen_collections:
+                gen_file_path = "{}results/{}line_generated_bboxes_refined80_{}.csv".format(relative_path, gen_folder, gen_coll)
+                # special delimiter because of legacy support, thanks to regex possible to support new and old formats
+                gen_df = pd.read_csv(gen_file_path, engine='python', delimiter=',\s*', header=None, names=gen_cols)  #delimiter=', ',
+                df_list.append(gen_df)
+            gen_df = pd.concat(df_list, ignore_index=True)
+
+        # prepare gen_df
+        list_gen_collection = []
+        if gen_file_path or (len(gen_collections) > 0):
+
+            num_before_filter = len(gen_df)
+            # IMPORTANT: filter gen data according to align ratio
+            gen_df = gen_df[gen_df.align_ratio > self.min_align_ratio]
+            print('Align Ratio {} :: Removed {} samples. [{}]'.format(self.min_align_ratio, num_before_filter - len(gen_df), len(gen_df)))
+            num_before_filter = len(gen_df)
+            # only keep inlier classes [0-240] (only required when using null hypos)
+            gen_df = gen_df[gen_df.train_label < num_classes]
+            print('Class Range {} :: Removed {} samples. [{}]'.format(num_classes, num_before_filter - len(gen_df), len(gen_df)))
+
+            # IMPORTANT: fill nan values in a way that avoids filtering
+            gen_df.nms_keep = gen_df.nms_keep.fillna(1).astype(bool)
+            gen_df.compl = gen_df.compl.fillna(50)
+            gen_df.ncompl = gen_df.ncompl.fillna(100)
+
+            num_before_filter = len(gen_df)
+            if self.filter_nms:
+                # filter using nms
+                gen_df = gen_df[gen_df.nms_keep]
+                print('NMS :: Removed {} samples. [{}]'.format(num_before_filter - len(gen_df), len(gen_df)))
+            num_before_filter = len(gen_df)
+
+            select_topn = False
+            if self.num_top_ncompl > 0:
+                # find top 5 for each class with more relaxed ncompl condition
+                select_min_ncompl = (gen_df.ncompl > self.min_ncompl_thresh)  # necessary condition
+                index_list = gen_df[select_min_ncompl].groupby('train_label').ncompl.nlargest(self.num_top_ncompl).index.values
+                select_topn = gen_df.index.isin(np.stack(index_list)[:, 1])
+
+            if self.compl_thresh > -1:
+                # filter using compl
+                gen_df = gen_df[gen_df.compl > self.compl_thresh]   # 0, 2, 4, 5
+                print('Completeness {} :: Removed {} samples. [{}]'.format(self.compl_thresh, num_before_filter - len(gen_df), len(gen_df)))
+            elif self.ncompl_thresh > -1:
+                # filter using compl
+                gen_df = gen_df[(gen_df.ncompl > self.ncompl_thresh) | select_topn]   # 0, 2, 4, 5
+                print('Completeness (norm.) {} :: Removed {} samples. [{}]'.format(self.ncompl_thresh, num_before_filter - len(gen_df), len(gen_df)))
+            print('class sample count stats: ')
+            print(gen_df.train_label.value_counts().describe())
+
+            # add additional columns
+            gen_df['collection'] = gen_df.folder.str.split('/').str[0]
+            gen_df['generated'] = True
+            gen_df['global_segm_idx'] = -1
+            gen_df['relative_bbox'] = gen_df[['x1', 'y1', 'x2', 'y2']].values.tolist()
+            gen_df['relative_bbox'] = gen_df['relative_bbox'].apply(np.array)
+            gen_df['mzl_label'] = gen_df['label']
+            gen_df['tablet_CDLI'] = gen_df['imageName']
+
+            # identify all collections with generated annotations
+            list_gen_collection = gen_df.collection.unique().tolist()
+
+
+        ### load and prepare list_sign_anno_df
+        # manual annotation files may be based on multiple collections
+        # for each collection
+        # store in list_sign_anno_df
+
+        # load bbox annotations
+        list_anno_collections = []
+        sign_anno_df_list = []
+        for collection in collections:
+            # load sign annotations
+            annotation_file = '{}data/annotations/bbox_annotations_{}.csv'.format(relative_path, collection)
+            # ATTENTION: only use gt annotations if collection is provided in collections parameter
+            if os.path.exists(annotation_file):
+                sign_anno_df = pd.read_csv(annotation_file, engine='python')  # read annotation file
+                # add additional columns
+                sign_anno_df['generated'] = False
+                sign_anno_df['global_segm_idx'] = -1
+                sign_anno_df['relative_bbox'] = sign_anno_df['relative_bbox'].apply(literal_eval)
+                sign_anno_df['relative_bbox'] = sign_anno_df['relative_bbox'].apply(np.array)  # convert to ndarray
+
+                # only keep inlier classes [0-240]
+                class_outlier_select = sign_anno_df.train_label < num_classes
+                if np.any(class_outlier_select):
+                    print('Drop {} outlier class samples from {}!'.format(np.sum(~class_outlier_select), collection))
+                    sign_anno_df = sign_anno_df[class_outlier_select]
+                # slice sign_anno_df if there are multiple different collections contained
+                for sub_collection in sign_anno_df.collection.unique():
+                    # store collection name
+                    list_anno_collections.append(sub_collection)
+                    # store collection specific slice of data frame
+                    sub_sign_anno_df = sign_anno_df[sign_anno_df.collection == sub_collection]
+                    sign_anno_df_list.append(sub_sign_anno_df)
+
+
+        ### extend collections
+        # create list of elementary collections
+        collections_ext = np.unique(list_gen_collection + list_anno_collections).tolist()
+        #collections_ext
+
+        ###################
+        # II) on collection level: load segments meta data and line annotation (optional)
+
+        ### load segment, line
+        # for each collection
+        # store in segments_df_list, line_anno_df_list
+
+        # reduced set of columns - only keep what is needed and maintained
+        # segments_df_columns = ['tablet_CDLI', 'view_desc', 'padded_bbox', 'collection', 'line_scale', 'scale',
+        #                        'im_path',
+        #                        'num_dets_hd', 'num_signs_visible']
+
+        segments_df_columns = ['tablet_CDLI', 'view_desc', 'bbox', 'collection', 'scale', 'im_path']
+
+        segments_df_list = []
+        line_anno_df_list = []
+        for collection in collections_ext:
+
+            # load segment metadata
+            annotation_file = '{}data/segments/tablet_segments_{}.csv'.format(relative_path, collection)
+            tablet_segments_df = pd.read_csv(annotation_file, engine='python', index_col=0)
+            # convert string of list to list
+            tablet_segments_df['bbox'] = tablet_segments_df['bbox'].apply(literal_eval)
+            tablet_segments_df['bbox'] = tablet_segments_df['bbox'].apply(np.array)  # convert to ndarray
+            # add additional columns
+            tablet_segments_df['imageName'] = tablet_segments_df['tablet_CDLI'] + '.jpg'
+            tablet_segments_df['im_path'] = '{}data/images/'.format(relative_path) + \
+                                            tablet_segments_df['collection'] + '/' + tablet_segments_df['imageName']
+            # get assigned segment (can be edited from outside without harm)
+            assigned_segments_df = tablet_segments_df[tablet_segments_df.assigned == True]
+
+            # load line annotations
+            annotation_file = '{}data/annotations/line_annotations_{}.csv'.format(relative_path, collection)
+            if os.path.exists(annotation_file):
+                line_anno_df = pd.read_csv(annotation_file, engine='python')
+            else:
+                line_anno_df = []
+
+            # collect data frames in lists
+            segments_df_list.append(assigned_segments_df[segments_df_columns])
+            line_anno_df_list.append(line_anno_df)
+
+
+        ### assemble ssd_segments_df with new index
+        # search all segments with annotations
+
+        list_segments_df_anno = []
+        for collection in collections_ext:
+            coll_idx = collections_ext.index(collection)
+            #print(collection)
+
+            # get all segment indices for this collection that contain annotations
+            list_segm_indices = []
+
+            # if there are gt annotations
+            if collection in list_anno_collections:
+                anno_coll_idx = list_anno_collections.index(collection)
+                if len(sign_anno_df_list[anno_coll_idx]) > 0:
+                    # load their indices
+                    segm_indices_anno = sign_anno_df_list[anno_coll_idx].segm_idx.unique()
+                    # filter annotations without assigned segment
+                    segm_indices_anno = segm_indices_anno[segm_indices_anno >= 0]
+                    list_segm_indices.append(segm_indices_anno)
+
+            # if there are generated annotations
+            if collection in list_gen_collection:
+                # select gen annotations by collection
+                col_gen_df = gen_df[gen_df.collection == collection]
+                # load their indices
+                segm_indices_anno = col_gen_df.segm_idx.unique()
+                list_segm_indices.append(segm_indices_anno)
+
+            # stack to obtain list of segment indices with annotations
+            segm_indices = np.unique(np.hstack(list_segm_indices))
+
+            # append only segments with anno
+            if len(segm_indices) > 0:
+                list_segments_df_anno.append(segments_df_list[coll_idx].loc[segm_indices])
+
+        # create new datasets ssd_segment_df
+        # concat dataframes and use reset_index to create column with old indices
+        ssd_segments_df = pd.concat(list_segments_df_anno).reset_index()
+        # rename column to segm_idx
+        ssd_segments_df.columns.values[0] = 'segm_idx'
+
+
+        ###################
+        # III) on segment level: load data and prepare dataset index
+
+        ### assemble ssd_sign_anno_df and update ssd_segments_df
+        # make sure all annos have relative_bbox
+        # additional column for ssd_sign_anno_df: global_segm_idx
+        # add two columns to ssd_segments_df: with num_anno, with_line_anno
+        # type of annotation: full, partly_w_line_anno, partly_w_line_dect
+
+        # sign_anno_df_cols = ['imageName', 'image_path', 'label', 'train_label', 'segm_idx', 'collection',
+        #                      'generated', 'relative_bbox', 'global_segm_idx']
+        sign_anno_df_cols = ['tablet_CDLI', 'mzl_label', 'train_label', 'segm_idx', 'collection',
+                             'generated', 'relative_bbox', 'global_segm_idx']
+        list_ssd_sign_anno_df = []
+
+        list_lines_annotated_per_segm = np.zeros(len(ssd_segments_df), dtype=bool)
+        list_num_anno_per_segm = np.zeros(len(ssd_segments_df), dtype=int)
+
+        # iterate over segments
+        for global_seg_idx, seg_rec in ssd_segments_df.iterrows():
+            image_name, scale, seg_bbox, image_path, view_desc = get_segment_meta(seg_rec)
+            res_name = "{}{}".format(image_name, view_desc)
+            collection = seg_rec.collection
+            segm_idx = seg_rec.segm_idx
+            coll_idx = collections_ext.index(collection)
+
+            ### if annotations available for segment, append to list
+            if collection in list_anno_collections:
+                anno_coll_idx = list_anno_collections.index(collection)
+                if len(sign_anno_df_list[anno_coll_idx]) > 0:
+                    sign_anno_df = sign_anno_df_list[anno_coll_idx]
+                    # select sign annos for segment
+                    segm_select = sign_anno_df.segm_idx == segm_idx
+
+                    if len(sign_anno_df[segm_select]) > 0:
+                        # update data frame column
+                        sign_anno_df.loc[segm_select, 'global_segm_idx'] = global_seg_idx
+                        # collect information
+                        sign_anno_seg = sign_anno_df[segm_select]
+                        list_num_anno_per_segm[global_seg_idx] = len(sign_anno_seg)
+                        list_ssd_sign_anno_df.append(sign_anno_seg[sign_anno_df_cols])
+
+            ### if generated annotations available, append to list
+            if collection in list_gen_collection:
+                # select sign annos for segment AND collection
+                segm_select = (gen_df.segm_idx == segm_idx) & (gen_df.collection == seg_rec.collection)
+                if len(gen_df[segm_select]) > 0:
+                    # update data frame columns
+                    gen_df.loc[segm_select, 'global_segm_idx'] = global_seg_idx
+                    # compute relative_bbox
+                    relative_boxes = gen_df[segm_select].relative_bbox.apply(
+                        lambda x: np.rint(convert_bbox_global2local(x, list(seg_bbox))).astype(int))
+                    gen_df.loc[segm_select, 'relative_bbox'] = relative_boxes
+
+                    # collect information
+                    sign_anno_seg = gen_df[segm_select]
+                    list_num_anno_per_segm[global_seg_idx] = len(sign_anno_seg)
+                    list_ssd_sign_anno_df.append(sign_anno_seg[sign_anno_df_cols])
+
+            ### check for line annotations
+            if len(line_anno_df_list[coll_idx]) > 0:
+                line_anno_df = line_anno_df_list[coll_idx]
+                # select line annos for segment
+                segm_select = line_anno_df.segm_idx == segm_idx
+                # if there are line annotations for segment
+                if len(line_anno_df[segm_select]) > 0:
+                    # assume all lines are annotated and remember type of line data
+                    list_lines_annotated_per_segm[global_seg_idx] = True
+
+        # add columns to ssd_segments_df
+        ssd_segments_df['num_anno'] = np.array(list_num_anno_per_segm)
+        ssd_segments_df['with_line_anno'] = list_lines_annotated_per_segm
+
+        # assemble ssd_sign_anno_df (drop old index)
+        ssd_sign_anno_df = pd.concat(list_ssd_sign_anno_df, ignore_index=True)
+
+        # this is deprecated, since bug fix
+        #assert np.sum(ssd_sign_anno_df.groupby('global_segm_idx').collection.nunique() > 1) == 0
+
+        ###################
+        # IV) Preload: segment images and line detections
+
+
+        ### preload segment images
+        # crop segment and convert to gray scale
+        # IMPORTANT: preload segment crops (without scaling, because memory)
+
+        image_data_list = []
+
+        # iterate over segments
+        for global_seg_idx, seg_rec in tqdm(ssd_segments_df.iterrows(), total=len(ssd_segments_df)):
+            image_name, scale, seg_bbox, image_path, view_desc = get_segment_meta(seg_rec)
+            res_name = "{}{}".format(image_name, view_desc)
+
+            # load composite image
+            pil_im = Image.open(image_path)
+            # crop segment
+            tablet_seg, new_bbox = crop_segment_from_tablet_im(pil_im, seg_bbox)
+            # convert to gray scale and store in list
+            image_data_list.append(tablet_seg.convert('L'))
+
+
+
+        ### preload line detections
+        # could pre-compute line annotations->line map
+        # this is a speed memory trade-off
+
+        line_detection_dict = {}
+        line_map_dict = {}
+
+        # only required if there are any generated detections
+        if self.use_linemaps:
+
+            # iterate over segments
+            for global_seg_idx, seg_rec in tqdm(ssd_segments_df.iterrows(), total=len(ssd_segments_df)):
+                image_name, scale, seg_bbox, image_path, view_desc = get_segment_meta(seg_rec)
+                res_name = "{}{}".format(image_name, view_desc)
+                # get collection idx
+                coll_idx = collections_ext.index(seg_rec.collection)
+                # get seg image shape
+                input_shape = np.array(image_data_list[global_seg_idx].size[::-1])
+
+                # if annotations are generated, need to create line map
+                #if seg_rec.collection in list_gen_collection:
+
+                # if no line annotations available
+                if True:  # ALWAYS use generated annotations not seg_rec.with_line_anno:  # if seg_rec.collection != 'train'
+                    # either skeleton or lbl_ind
+                    line_res_path = "{}results/results_line/{}/{}".format(relative_path, line_model_version, seg_rec.collection)
+                    lines_file = "{}/{}_lbl_ind.npy".format(line_res_path, res_name)
+                    # lines_file = "{}/{}_skeleton.npy".format(line_res_path, res_name)
+                    lbl_ind_x = np.load(lines_file).astype(int)
+                    # store in dictionary
+                    line_detection_dict[global_seg_idx] = lbl_ind_x
+
+                    # create line map from detections -> PIL binary
+                    lbl_im = create_line_map_from_line_det(line_detection_dict, global_seg_idx, scale, input_shape)
+
+                else:
+                    # create line map from line annotations -> PIL binary
+                    lbl_im = create_line_map_from_line_anno(line_anno_df_list, coll_idx, seg_rec.segm_idx, input_shape)
+
+                # resize to image size (do here or in next iter
+                # lbl_im = lbl_im.resize(input_shape[::-1])
+
+                # store in dictionary
+                line_map_dict[global_seg_idx] = lbl_im
+
+
+        ###################
+        # V) Tiling
+
+        ### compute ssd_tile_df
+        list_tile_boxes = []
+        list_tile_support = []
+        list_tile_seg_idx = []
+
+        # iterate over segments
+        for global_seg_idx, seg_rec in tqdm(ssd_segments_df.iterrows(), total=len(ssd_segments_df)):
+            image_name, scale, seg_bbox, image_path, view_desc = get_segment_meta(seg_rec)
+            res_name = "{}{}".format(image_name, view_desc)
+
+            ## compute tiles
+            # get segment shape
+            imw, imh = image_data_list[global_seg_idx].size
+            # compute tile boxes
+            tile_boxes, _, _ = compute_tiles(imw, imh, scale, tile_shape=tile_shape)
+            # append
+            list_tile_boxes.append(tile_boxes)
+            list_tile_seg_idx.append([global_seg_idx] * len(tile_boxes))
+
+            ## check overlap of tile boxes and sign boxes
+            # get annotations
+            seg_sign_annos = ssd_sign_anno_df[ssd_sign_anno_df.global_segm_idx == global_seg_idx]
+            sign_bboxes = np.stack(seg_sign_annos.relative_bbox.values)
+
+            # OPT I: compute IOU
+            # tiles_sign_iou = bbox_overlaps(tile_boxes.astype(float), sign_bboxes.astype(float))
+            # tile_support = np.sum(tiles_sign_iou > 0.005, axis=1)  # 0.01 or 0.005
+
+            # OPT II: compute ctr overlap (strict)
+            tiles_sign_ctrs = bbox_ctr_overlaps(tile_boxes.astype(float), sign_bboxes.astype(float))
+            tile_support = np.sum(tiles_sign_ctrs, axis=1).astype(int)
+            list_tile_support.append(tile_support)
+
+        # stack tile boxes
+        tile_boxes_arr = np.vstack(list_tile_boxes)
+        tile_global_seg_idx = np.hstack(list_tile_seg_idx).astype(int)
+        tile_support_arr = np.hstack(list_tile_support)
+
+        # create tile_df
+        tile_df = pd.DataFrame({'global_segm_idx': tile_global_seg_idx,
+                                'tile_bbox': tile_boxes_arr.tolist(),
+                                'num_anno': tile_support_arr})
+
+        # OPTIONAL: filter tiles with little support
+        if remove_empty_tiles and not use_balanced_idx:
+            tile_df = tile_df[tile_df.num_anno > 0]    # 0
+            tile_df.reset_index(drop=True)
+
+        ###################
+        # VI) Dataset index
+
+        ## Balance sampling of tiles with anno per tile
+        # create an dataset index which is proportional to annotations per tile
+        # attention: tiles without support will be ignored!
+        use_balanced_idx = use_balanced_idx    # good for debug
+
+        # 1) get tile factors
+        tile_factors = tile_df.num_anno.values
+        # 2) compute list to sample from
+        if use_balanced_idx:
+            sample2tile_list = []
+            for ii, tile_factor in enumerate(tile_factors):
+                sample2tile_list.extend([ii] * tile_factor)
+        else:
+            sample2tile_list = tile_df.index.values
+
+        ###################
+        # attach resulting data structures to class
+        self.collections = collections
+        self.collections_ext = collections_ext
+
+        self.ssd_segments_df = ssd_segments_df
+        self.ssd_sign_anno_df = ssd_sign_anno_df
+        self.tile_df = tile_df
+
+        self.image_data_list = image_data_list
+        # self.line_detection_dict = line_detection_dict
+        self.line_map_dict = line_map_dict
+
+        self.line_anno_df_list = line_anno_df_list
+        # self.sign_anno_df_list = sign_anno_df_list
+        # self.segments_df_list = segments_df_list
+
+        self.sample2tile_list = sample2tile_list
+
+        # setup finished
+        print("Setup dataset spanning {} collections with {} annotations [{} segments, {} tiles, {} indices]".format(
+            len(collections_ext), len(ssd_sign_anno_df), len(ssd_segments_df), len(tile_df), len(sample2tile_list)))
+
+    def __getitem__(self, index):
+        # get tile
+        tile_index = self.sample2tile_list[index]
+        tile_rec = self.tile_df.loc[tile_index]
+        tile_bbox = tile_rec.tile_bbox
+
+        # get segment
+        global_seg_idx = tile_rec.global_segm_idx
+        seg_rec = self.ssd_segments_df.loc[global_seg_idx]
+        coll_idx = self.collections_ext.index(seg_rec.collection)
+
+        # load segment meta data
+        image_name, scale, seg_bbox, path_to_image, view_desc = get_segment_meta(seg_rec)
+        with_line_anno = seg_rec.with_line_anno
+
+        # get segment image
+        pil_im = self.image_data_list[global_seg_idx]
+
+        # get sign annos
+        select_segm = self.ssd_sign_anno_df.global_segm_idx == global_seg_idx
+        segm_annos = self.ssd_sign_anno_df[select_segm]
+        seg_boxes = np.stack(segm_annos.relative_bbox)
+        labels = segm_annos.train_label.values
+        are_generated = segm_annos.generated.any()
+
+        # OPT II: tensor functions adapted from kuangliu's code
+        # https://github.com/kuangliu/torchcv/tree/master/torchcv/transforms
+
+        # convert to torch tensors
+        seg_boxes = torch.from_numpy(seg_boxes).float()
+        labels = torch.from_numpy(labels)
+
+        if self.use_linemaps:
+
+            if are_generated:
+                # incomplete annotations -> use line detections to avoid false negatives
+                lbl_im = self.line_map_dict[global_seg_idx]
+                # resize to crop
+                lbl_im = lbl_im.resize(pil_im.size)
+            else:
+                # assume all ground truth signs are annotated
+                # provide dummy label map
+                lbl_im = Image.new('1', pil_im.size, 0)
+
+            if False:
+                from skimage.color import label2rgb
+                import matplotlib.pyplot as plt
+
+                plt.figure(figsize=(10, 10))
+                # plt.imshow(lbl_ind)
+                plt.imshow(label2rgb(np.asarray(lbl_im), np.asarray(pil_im)))
+                plt.show()
+
+            # crop tile
+            # print pil_im.size, seg_boxes.shape, labels.shape, tile_bbox
+            im, boxes, labels, linemap = crop_box_lm(pil_im, seg_boxes, labels, lbl_im, tile_bbox)
+            # scale tile
+            im, boxes, linemap = resize_lm(im, boxes, linemap, None, scale=scale)
+
+            # apply augmentation pipeline and convert from PIL to numpy
+            if self.transform is not None:
+                im, boxes, labels, linemap = self.transform(im, boxes, labels, linemap)
+
+            return im, boxes, labels, linemap
+
+        else:
+
+            # crop tile
+            #print pil_im.size, seg_boxes.shape, labels.shape, tile_bbox
+            im, boxes, labels = crop_box(pil_im, seg_boxes, labels, tile_bbox)
+            # scale tile
+            im, boxes = resize(im, boxes, None, scale=scale)
+
+            # apply augmentation pipeline and convert from PIL to numpy
+            if self.transform is not None:
+                im, boxes, labels = self.transform(im, boxes, labels)
+
+            return im, boxes, labels
+
+    def __len__(self):
+        return len(self.sample2tile_list)
+
+
+# helper functions
+
+def create_line_map_from_line_anno(line_anno_df_list, coll_idx, segm_idx, input_shape):
+    line_height = 3
+
+    # select line annotations
+    line_anno_df = line_anno_df_list[coll_idx]
+    seg_line_df = line_anno_df[line_anno_df.segm_idx == segm_idx]
+    # # collect all line coordinates
+    rr, cc, lbboxes = collect_line_coords(seg_line_df, scale=1 / 16.)
+    # compute line trafo
+    line_trafo = create_line_trafo(rr, cc, input_shape / 16)
+    # # compute masks
+    line_mask = line_trafo < line_height
+    # convert to binary PIL image
+    lbl_im = convert2binaryPIL(line_mask)
+
+    return lbl_im
+
+
+def create_line_map_from_line_det(line_detection_dict, global_seg_idx, scale, input_shape):
+    # get line detection
+    lbl_ind = line_detection_dict[global_seg_idx]
+    # compute line map
+    lbl_ind = compute_image_label_map(lbl_ind, np.array(input_shape * scale, dtype=int), padding=5)  # default:16, other padding=16 20 24
+    # convert to binary PIL image
+    lbl_im = convert2binaryPIL(lbl_ind)
+
+    return lbl_im
+
+
+# run test
+def test(collections=['train'], gen_collections=[], gen_folder=None, use_balanced_idx=True, use_linemaps=False,
+         remove_empty_tiles=False, min_align_ratio=0.2, relative_path='../../'):
+    ssd_dataset = CuneiformSSD(collections=collections, gen_file_path=None, gen_collections=gen_collections,
+                               gen_folder=gen_folder, relative_path=relative_path,
+                               use_balanced_idx=use_balanced_idx, tile_shape=[600, 600], use_linemaps=use_linemaps,
+                               remove_empty_tiles=remove_empty_tiles, min_align_ratio=min_align_ratio)
+    return ssd_dataset
@@ -0,0 +1,286 @@
+import os
+import numpy as np
+import pandas as pd
+from PIL import Image
+from ast import literal_eval
+
+from scipy import ndimage as ndi
+from skimage.util import invert
+from skimage.draw import line, line_aa
+
+import torch.utils.data as data
+from tqdm import tqdm
+
+
+from ..utils.bbox_utils import clip_boxes
+from ..utils.transform_utils import crop_pil_image
+from ..detection.sign_detection import *
+
+
+### helper functions
+
+def collect_line_coords(seg_line_df, scale=1):
+    # group according to line idx
+    grouped = seg_line_df.groupby('line_idx')
+
+    # collect all line coordinates
+    rr_list, cc_list, lbbox_list = [], [], []
+    for i, line_rec in grouped:
+        xx = np.rint(line_rec.x.values * scale).astype(int)
+        yy = np.rint(line_rec.y.values * scale).astype(int)
+        lbbox = np.array([np.min(xx), np.min(yy), np.max(xx), np.max(yy)])
+        lbbox_list.append(lbbox)
+        for li in range(len(xx) - 1):
+            rr, cc, _ = line_aa(yy[li], xx[li], yy[li + 1], xx[li + 1])
+            # rr, cc = line(yy[li], xx[li], yy[li+1], xx[li+1])
+            rr_list.append(rr)
+            cc_list.append(cc)
+
+    # stack coordinates
+    rr = np.hstack(rr_list)
+    cc = np.hstack(cc_list)
+    lbboxes = np.stack(lbbox_list)
+    return rr, cc, lbboxes
+
+
+def create_line_trafo(rr, cc, input_shape):
+    # create mask
+    line_mask = np.zeros(input_shape).astype(bool)
+    line_mask[rr, cc] = 1
+    # compute distance transform after inverting
+    line_trafo = ndi.distance_transform_edt(invert(line_mask))
+    return line_trafo
+
+
+def compute_sampling_freq(line_trafo, sample_mask, sample_radius, expo=2):
+    sample_freq = line_trafo
+    # convert to probs
+    sample_freq = (-sample_freq / sample_radius + 1) ** expo
+    # sample_freq = -sample_freq/sample_radius + 1
+    # sample_freq = np.exp(-sample_freq/sample_radius * 2)
+
+    # set area that is not sampled from to 'zero'
+    sample_freq[sample_mask < 1] = 0
+    return sample_freq
+
+
+def spatial_sample(sample_freq):
+    thresh = np.random.random_sample()
+    ylist, xlist = np.where(sample_freq > thresh)
+    select_idx = np.random.randint(len(xlist))
+    return xlist[select_idx], ylist[select_idx]
+
+
+def spatial_sample_negative(sample_freq):
+    # too slow
+    if 0:
+        # remove samples close to border
+        border_mask = np.zeros_like(sample_freq, dtype=bool)
+        bdist = 150
+        border_mask[bdist:-bdist, bdist:-bdist] = True
+        # apply masks
+        ylist, xlist = np.where((sample_freq == 0) & (border_mask))
+        select_idx = np.random.randint(len(xlist))
+        return xlist[select_idx], ylist[select_idx]
+    # faster
+    if 1:
+        # remove samples close to border
+        border_mask = np.zeros_like(sample_freq, dtype=bool)
+        bdist = 150
+        border_mask[bdist:-bdist, bdist:-bdist] = True
+        x, y = 0, 0
+        # (line_map[x, y] is True) results in overlap with hard negative samples
+        for i in range(100):
+            # pick coordinate
+            select_idx = np.random.randint(np.prod(sample_freq.shape))
+            # back to matrix index
+            x, y = np.unravel_index(select_idx, sample_freq.shape)
+            if (sample_freq[x, y] == 0) and (border_mask[x, y] == True):
+                break
+        return y, x
+
+
+def pad_bboxes(lbboxes, context_pad):
+    # works inplace, so need to return
+    for bb in lbboxes:
+        bb[:2] = bb[:2] - context_pad
+        bb[2:4] = bb[2:4] + context_pad
+    # return lbboxes
+
+
+def spatial_sample_line(sample_freq, lbbox):
+    thresh = np.random.random_sample()
+    ylist, xlist = np.where(sample_freq[lbbox[1]:lbbox[3], lbbox[0]:lbbox[2]] >= thresh)
+    if len(xlist) == 0:
+        print lbbox, sample_freq.shape
+    select_idx = np.random.randint(len(xlist))
+    return lbbox[0] + xlist[select_idx], lbbox[1] + ylist[select_idx]
+
+
+### CuneiformLine Class
+
+class CuneiformLines(data.Dataset):
+
+    def __init__(self, dataset_params, transform=None, target_transform=None, relative_path='../', split='train'):
+        # annotation_path, params,
+
+        # set params
+        self.line_height = dataset_params['line_height']
+        self.sample_radius = dataset_params['sample_radius']  # self.line_height * 3
+        self.expo = dataset_params['expo']
+        if 'train' in split:
+            self.soft_bg_frac = dataset_params['soft_bg_frac'][0]
+        else:
+            self.soft_bg_frac = dataset_params['soft_bg_frac'][1]
+
+        self.crop_size = dataset_params['crop_size']
+        self.patch_size = dataset_params['patch_size']
+
+        # transforms for data preparation
+        self.transform = transform
+        self.target_transform = target_transform
+
+        # load line annotation
+        annotation_file = '{}data/annotations/line_annotations_{}.csv'.format(relative_path, split)
+        line_anno_df = pd.read_csv(annotation_file, engine='python')
+
+        # load segment metadata
+        annotation_file = '{}data/segments/tablet_segments_{}.csv'.format(relative_path, split)
+        tablet_segments_df = pd.read_csv(annotation_file, engine='python', index_col=0)
+        # convert string of list to list
+        tablet_segments_df['bbox'] = tablet_segments_df['bbox'].apply(literal_eval)
+        tablet_segments_df['bbox'] = tablet_segments_df['bbox'].apply(np.array)  # convert to ndarray
+        # additional columns
+        tablet_segments_df['imageName'] = tablet_segments_df['tablet_CDLI'] + '.jpg'
+        tablet_segments_df['im_path'] = '{}data/images/'.format(relative_path) + \
+                                        tablet_segments_df['collection'] + '/' + tablet_segments_df['imageName']
+
+        # select assigned
+        assigned_segments_df = tablet_segments_df[tablet_segments_df.assigned == True]
+
+        # pre-load segments and compute line and sampling maps
+        self.valid_indices = []
+        self.num_lines_list = []
+        self.image_data_list = []
+        self.line_map_list = []
+        self.sample_freq_list = []
+        lbboxes_list = []
+        for segment_idx, segment_rec in tqdm(assigned_segments_df.iterrows(), total=len(assigned_segments_df)):
+            imageName = segment_rec.tablet_CDLI
+            scale = segment_rec.scale
+            seg_bbox = segment_rec.bbox
+            path_to_image = segment_rec.im_path
+            view_desc = "{}".format(segment_rec.view_desc).replace("nan", "")
+
+            # select line annotations
+            seg_line_df = line_anno_df[line_anno_df.segm_idx == segment_idx]
+
+            # check if any annotations available
+            if len(seg_line_df) > 0:
+                # print(split, imageName, view_desc)
+
+                ### 1) load segment
+                # prepare input tablet
+                pil_im = Image.open(path_to_image)
+                tablet_seg, new_bbox = crop_segment_from_tablet_im(pil_im, seg_bbox)
+                # scale image
+                input_im = rescale_segment_single(tablet_seg, scale)
+                input_shape = input_im.size[::-1]
+
+                ### 2) line map
+                # compute interpolated line coordinates
+
+                # collect all line coordinates
+                rr, cc, lbboxes = collect_line_coords(seg_line_df, scale=scale)
+                # pad with sample radius
+                pad_bboxes(lbboxes, self.sample_radius)
+                clip_boxes(lbboxes, input_shape)
+                # compute line trafo
+                line_trafo = create_line_trafo(rr, cc, input_shape)
+                # compute masks
+                line_mask = line_trafo < self.line_height
+                sample_mask = line_trafo < self.sample_radius
+                # compute frequency
+                sample_freq = compute_sampling_freq(line_trafo, sample_mask, self.sample_radius, self.expo)
+
+                ### 3) save data
+                # append to list
+                self.valid_indices.append(segment_idx)
+                self.num_lines_list.append(len(seg_line_df.line_idx.unique()))
+                self.image_data_list.append(input_im)
+                self.line_map_list.append(line_mask)
+                self.sample_freq_list.append(sample_freq)
+                lbboxes_list.append(lbboxes)
+
+        # stack lbboxes
+        self.lbboxes = np.vstack(lbboxes_list)
+
+        self.line2mem_list = []
+        # for valid_idx, num_lines in zip(self.valid_indices, self.num_lines_list):
+        for men_idx, num_lines in enumerate(self.num_lines_list):
+            self.line2mem_list.extend([men_idx] * num_lines)
+
+        # Balance sampling with line length
+        # 1) get line factors by line width and normalisation
+        widths = self.lbboxes[:, 2] - self.lbboxes[:, 0]
+        # factor required to make smallest length larger equal 1
+        norm_factor_int = np.ceil(float(widths.sum()) / widths.min())
+        norm_widths = widths / float(widths.sum())
+        line_factors = np.rint(norm_factor_int * norm_widths).astype(int)
+        # 2) compute list to sample from
+        self.sample2line_list = []
+        for ii, line_factor in enumerate(line_factors):
+            self.sample2line_list.extend([ii] * line_factor)
+
+        # increase test set size to obtain more stable error
+        if split == 'test':
+            self.sample2line_list = self.sample2line_list * 5
+
+        # setup finished
+        print("Setup {} dataset with {} rows and {} samples".format(split, len(self.line2mem_list), len(self)))
+
+    def __getitem__(self, index):
+
+        # line_index = index
+        line_index = self.sample2line_list[index]
+
+        lbbox = self.lbboxes[line_index]
+        mem_idx = self.line2mem_list[line_index]
+        # get required data
+        segm_im = self.image_data_list[mem_idx]
+        line_map = self.line_map_list[mem_idx]
+        sample_freq = self.sample_freq_list[mem_idx]
+
+        if np.random.random() > self.soft_bg_frac:
+            # sample spatial location
+            # y, x = spatial_sample(sample_freq) # coordinates need to be inverted
+            y, x = spatial_sample_line(sample_freq, lbbox)
+            # compute target label
+            target = int(line_map[x, y])
+        else:
+            y, x = spatial_sample_negative(sample_freq)
+            # compute target label
+            target = int(line_map[x, y])  # should be always negative
+
+        # crop patch at sampled location (use PIL for that)
+        hw, hh = self.patch_size[0] / 2., self.patch_size[1] / 2.
+        bbox = [y - hw, x - hh, y + hw, x + hh]
+
+        # new fast
+        im, bb = crop_pil_image(segm_im, bbox, context_pad=0, pad_to_square=False)
+
+        # apply augmentation pipeline and convert from PIL to numpy
+        if self.transform is not None:
+            im = self.transform(im)
+
+        if self.target_transform is not None:
+            target = self.target_transform(target)
+
+        return im, target
+
+    def __len__(self):
+        # return total lines
+        # return len(self.sample_indices)
+        return len(self.sample2line_list)
+
+
@@ -0,0 +1,149 @@
+import numpy as np
+import pandas as pd
+from PIL import Image
+from ast import literal_eval
+from tqdm import tqdm
+
+import torch.utils.data as data
+
+
+from ..detection.sign_detection import crop_segment_from_tablet_im, rescale_segment_single
+from ..utils.torchcv.transforms.resize import resize
+
+
+class CuneiformSegments(data.Dataset):
+    # lightweight version of cunei_dataset_segments
+    # no annotations processing
+    # no preloading
+
+    def __init__(self, transform=None, target_transform=None, collection='train', collections=[],
+                 relative_path='../', rescale=1.0, only_assigned=True, preload_segments=False):
+
+        self.rescale = rescale
+        self.relative_path = relative_path
+        self.collection = collection
+        self.preload_segments = preload_segments
+
+        # transforms for data preparation
+        self.transform = transform
+        self.target_transform = target_transform
+
+        if len(collections) > 0:
+            # load segment metadata for multiple collections
+            df_list = []
+            for collection in collections:
+                annotation_file = '{}data/segments/tablet_segments_{}.csv'.format(relative_path, collection)
+                tablet_segments_df = pd.read_csv(annotation_file, engine='python', index_col=0)
+                df_list.append(tablet_segments_df)
+            # concatenate to single df
+            tablet_segments_df = pd.concat(df_list, ignore_index=True)
+        else:
+            # load segment metadata for single collection
+            annotation_file = '{}data/segments/tablet_segments_{}.csv'.format(relative_path, collection)
+            tablet_segments_df = pd.read_csv(annotation_file, engine='python', index_col=0)
+
+        # convert string of list to list
+        tablet_segments_df['bbox'] = tablet_segments_df['bbox'].apply(literal_eval)
+        tablet_segments_df['bbox'] = tablet_segments_df['bbox'].apply(np.array)  # convert to ndarray
+        # add additional columns
+        tablet_segments_df['imageName'] = tablet_segments_df['tablet_CDLI'] + '.jpg'
+        tablet_segments_df['im_path'] = '{}data/images/'.format(relative_path) + \
+                                        tablet_segments_df['collection'] + '/' + tablet_segments_df['imageName']
+
+        # get assigned segment (can be edited from outside without harm)
+        if only_assigned:
+            self.assigned_segments_df = tablet_segments_df[(tablet_segments_df.assigned == True)]
+        else:
+            self.assigned_segments_df = tablet_segments_df
+
+        # make available for outside
+        self.tablet_segments_df = tablet_segments_df
+
+        self.image_data_list = []
+        self.sample2seg_list = []
+        self.sidx2didx = []
+        self.setup_sample_list()
+
+    def setup_sample_list(self, updated_df=None):
+        if updated_df is not None:
+            self.assigned_segments_df = updated_df
+
+        ### preload segment images
+        # crop segment and convert to gray scale
+        # IMPORTANT: preload segment crops (without scaling, because memory)
+        image_data_list = []
+        if self.preload_segments:
+            # iterate over segments
+            for seg_idx, seg_rec in tqdm(self.assigned_segments_df.iterrows(), total=len(self.assigned_segments_df)):
+                # load segment meta data
+                image_name, scale, seg_bbox, path_to_image, view_desc = self.get_segment_meta(seg_rec)
+                # prepare input tablet
+                pil_im = Image.open(path_to_image)
+                tablet_seg, new_bbox = crop_segment_from_tablet_im(pil_im, seg_bbox)
+                # store in list
+                image_data_list.append(tablet_seg)
+        self.image_data_list = image_data_list
+
+        self.sample2seg_list = self.assigned_segments_df.index.values
+
+        # map from seg idx to dataset idx
+        self.sidx2didx = dict(zip(self.sample2seg_list, range(len(self.sample2seg_list))))
+
+        # setup finished
+        print("Setup {} dataset with {} elements".format(self.collection, len(self)))
+
+    def __getitem__(self, index):
+
+        seg_idx = self.sample2seg_list[index]
+        seg_rec = self.assigned_segments_df.loc[seg_idx]
+
+        # load segment meta data
+        image_name, scale, seg_bbox, path_to_image, view_desc = self.get_segment_meta(seg_rec)
+
+        # specify target
+        target = seg_idx
+
+        # get segment image
+        if self.preload_segments:
+            tablet_seg = self.image_data_list[index]
+        else:
+            # prepare input tablet
+            pil_im = Image.open(path_to_image)
+            tablet_seg, new_bbox = crop_segment_from_tablet_im(pil_im, seg_bbox)
+
+        # scale image
+        if 0:
+            # scale image
+            im = rescale_segment_single(tablet_seg, scale)
+        else:
+            # convert to gray scale
+            # tablet_seg = tablet_seg.convert('L')
+            # scale segment
+            im, _ = resize(tablet_seg, None, None, scale=scale)
+
+        # apply augmentation pipeline and convert from PIL to numpy
+        if self.transform is not None:
+            im = self.transform(im)
+
+        if self.target_transform is not None:
+            target = self.target_transform(target)
+
+        return im, target
+
+    def __len__(self):
+        # return total lines
+        return len(self.assigned_segments_df)
+
+
+    def get_segment_meta(self, segment_rec):
+        image_name = segment_rec.tablet_CDLI
+
+        # this should control which scale is used in consecutive processing
+        scale = segment_rec.scale * self.rescale
+
+        seg_bbox = segment_rec.bbox
+        path_to_image = segment_rec.im_path
+        view_desc = "{}".format(segment_rec.view_desc).replace("nan", "")
+
+        return image_name, scale, seg_bbox, path_to_image, view_desc
+
@@ -0,0 +1,725 @@
+import numpy as np
+
+import matplotlib.pyplot as plt
+import matplotlib.ticker as ticker
+from matplotlib import cm
+
+import PIL.Image as Image
+from ..utils.transform_utils import crop_pil_image
+
+from ..evaluations.config import cfg
+from ..utils.bbox_utils import clip_boxes
+
+
+def visualize_net_output_single(im, predicted, cunei_id=30, num_classes=None, min_prob=0.95):
+    # visualize output of single crop detections
+
+    if num_classes is None:
+        num_classes = cfg.TEST.NUM_CLASSES
+
+    # cross-image products
+    output = np.mean(predicted, axis=0)
+    # cross-channel products
+    lbl_ind = np.argmax(output, axis=0)
+
+    ctr_crop = predicted[0, ...]
+
+    plt.figure(figsize=(16, 24))
+    plt.subplot(4, 2, 1)
+    plt.imshow(im, cmap=cm.Greys_r)
+    plt.title('input')
+    plt.subplot(4, 2, 2)
+    plt.imshow(ctr_crop.squeeze()[cunei_id, ...])
+    plt.colorbar()
+    plt.title('class #{}'.format(cunei_id))
+    plt.subplot(4, 2, 3)
+    cmap = plt.get_cmap('Paired')
+    plt.imshow(lbl_ind, cmap=cmap, vmin=0, vmax=num_classes)
+    plt.colorbar()
+    plt.title('argmax class')
+    plt.subplot(4, 2, 4)
+    test = np.argmax(ctr_crop.squeeze(), axis=0)
+    test[np.max(ctr_crop.squeeze(), axis=0) < min_prob] = 0
+    plt.imshow(test, cmap=cmap, vmin=0, vmax=num_classes)
+    plt.colorbar()
+    plt.title('argmax class ( {} confidence)'.format(min_prob))
+
+
+def _refine_detections(predicted):
+    # single image product from center crop
+    ctr_crop = predicted[4, ...]
+    # cross-image products
+    output = np.mean(predicted, axis=0)
+    max_output = np.max(predicted, axis=0)
+    uncertainty = np.var(predicted, axis=0)
+    # cross-channel products
+    lbl_ind = np.argmax(output, axis=0)
+    average_unc = np.mean(uncertainty, axis=0)
+    min_average_unc = np.min(average_unc)
+    max_average_unc = np.max(average_unc)
+    max_unc = np.max(uncertainty)
+
+    # save products
+    # sio.savemat('results/test_tablet_cuneiNet_{}_{}_scale_{}_{}.mat'.format(training_round, imageName, scale, negatives_used),
+    #     {#'probs':ctr_crop,
+    #      #'pred_labels': np.argmax(ctr_crop,axis=0),
+    #      #'entropy': -np.sum(ctr_crop * np.log(ctr_crop)),
+    #      'predicted': predicted,
+    #      'avg_probs': output,
+    #      'avg_unc': average_unc,
+    #      'avg_pred_labels': lbl_ind})
+
+    return ctr_crop, output, max_output, uncertainty, lbl_ind, average_unc, min_average_unc, max_average_unc, max_unc
+
+
+def visualize_net_output(im, predicted, cunei_id=30, num_classes=None):
+    # visualize output of 5 star crop detections
+
+    if num_classes is None:
+        num_classes = cfg.TEST.NUM_CLASSES
+
+    ctr_crop, output, max_output, uncertainty, \
+    lbl_ind, average_unc, min_average_unc, max_average_unc, max_unc = _refine_detections(predicted)
+
+    plt.figure(figsize=(16, 24))
+    plt.subplot(3, 2, 1)
+    plt.imshow(im, cmap=cm.Greys_r)
+    plt.title('input')
+    plt.subplot(3, 2, 2)
+    plt.imshow(ctr_crop.squeeze()[cunei_id, ...])
+    plt.colorbar()
+    plt.title('class #{}'.format(cunei_id))
+    plt.subplot(3, 2, 3)
+    cmap = plt.get_cmap('Paired')
+    plt.imshow(np.argmax(ctr_crop.squeeze(), axis=0), cmap=cmap, vmin=0, vmax=num_classes)
+    plt.colorbar()
+    plt.title('argmax class')
+    plt.subplot(3, 2, 4)
+    test = np.argmax(ctr_crop.squeeze(), axis=0)
+    test[np.max(ctr_crop.squeeze(), axis=0) < 0.95] = 0
+    plt.imshow(test, cmap=cmap, vmin=0, vmax=num_classes)
+    plt.colorbar()
+    plt.title('argmax class (0.95 confidence)')
+    #plt.subplot(4, 2, 5)
+    #cmap = plt.get_cmap('Paired')
+    #plt.imshow(lbl_ind, cmap=cmap, vmin=0, vmax=num_classes)
+    #plt.colorbar()
+    #plt.title('avg argmax class')
+    plt.subplot(3, 2, 5)
+    plt.imshow(average_unc, vmin=0, vmax=max_average_unc)
+    plt.colorbar()
+    plt.title('shift induced uncertainty')
+    plt.subplot(3, 2, 6)
+    # entropy
+    plt.imshow(-np.sum(ctr_crop.squeeze() * np.log(ctr_crop.squeeze()), axis=0))
+    plt.colorbar()
+    plt.title('entropy')
+
+
+def _im_to_pyra_coords(pyra, boxes):
+    # boxes is N x 4 where each row is a box in the image specified
+    # by [x1 y1 x2 y2].
+    #
+    # Output is a cell array where cell i holds the pyramid boxes
+    # coming from the image box
+    boxes = boxes - 1
+    pyra_boxes = []
+    for level in range(pyra['num_levels']):
+        level_boxes = boxes * pyra['scales'][level]
+        level_boxes = np.round(level_boxes / pyra['stride'])
+        level_boxes = level_boxes
+        # add padding
+        level_boxes[:, 0] = level_boxes[:, 0] + pyra['padx']
+        level_boxes[:, 2] = level_boxes[:, 2] + pyra['padx']
+        level_boxes[:, 1] = level_boxes[:, 1] + pyra['pady']
+        level_boxes[:, 3] = level_boxes[:, 3] + pyra['pady']
+        pyra_boxes.append(level_boxes)
+    return pyra_boxes
+
+
+def _pyra_to_im_coords(pyra, boxes):
+    # boxes is N x 5 where each row is a box in the format [x1 y1 x2 y2 pyra_level]
+    # where (x1, y1) is the upper-left corner of the box in pyramid level pyra_level
+    # and (x2, y2) is the lower-right corner of the box in pyramid level pyra_level
+    # Assumes 1-based indexing.
+    # pyramid to im scale factors for each scale
+
+    scales = pyra['stride'] / pyra['scales'][0]
+
+    # pyramid to im scale factors for each pyra level in boxes
+    if len(scales.shape) > 0:
+        scales = scales[boxes[:, -1]];
+
+    # Remove padding from pyramid boxes
+    boxes[:, 0] = boxes[:, 0] - pyra['padx']
+    boxes[:, 2] = boxes[:, 2] - pyra['padx']
+    boxes[:, 1] = boxes[:, 1] - pyra['pady']
+    boxes[:, 3] = boxes[:, 3] - pyra['pady']
+
+    im_boxes = boxes[:, :4] * scales
+    return im_boxes
+
+
+def _pyramid_patch_box(x1, y1, feat_map_sz, pyra, lvl_idx, opt='A'):
+    # compute image patch box coordinates in original image
+    # should also work for all features of one image at once
+    # REQUIREMENTS:
+    # position of feature in feature map:                   x1, y1
+    # dimension of feature map:                             feat_map_sz
+    # scale of input image (relative to original scale):    pyra, lvl_idx
+    # stride that is determined by network architecture:    pyra
+
+    # OPTION A
+    if opt == 'A':
+        boxes = np.array([x1 - 0.5, y1 - 0.5, x1 + 0.5, y1 + 0.5]).transpose([1, 0])
+        boxes = np.concatenate([boxes, np.tile(lvl_idx, [len(x1), 1])], axis=1)
+    # OPTION B - more accurate
+    elif opt == 'B':
+        x_step = (feat_map_sz[1] - 1) / float(feat_map_sz[1])
+        y_step = (feat_map_sz[0] - 1) / float(feat_map_sz[0])
+
+        boxes = np.array(
+            [(x1 - 0.5) * x_step, (y1 - 0.5) * y_step, (x1 + 0.5) * x_step, (y1 + 0.5) * y_step]).transpose([1, 0])
+        boxes = np.concatenate([boxes, np.tile(lvl_idx, [len(x1), 1])], axis=1)
+
+    im_patch_box = np.floor(_pyra_to_im_coords(pyra, boxes))
+    return im_patch_box
+
+
+def _pyramid_rf_box(im_sz, im_patch_box, rf_size, scales, lvl_idx):
+    # compute receptive field box coordinates in original image
+    # (given patch_box coordinates in original image)
+    # REQUIREMENTS:
+    # receptive field size determined by network architecture:  rf_size [H, W]
+    # original image size:                                      im_sz
+    # patch box size in original image:                         im_patch_box
+    # scale of input image relative to original image:          scales, lvl_idx
+
+    scaled_rf_sz = rf_size / scales[lvl_idx]
+
+    im_rf_box = np.zeros_like(im_patch_box)
+    im_rf_box[:, 0] = im_patch_box[:, 0] - scaled_rf_sz[1] / 2.
+    im_rf_box[:, 1] = im_patch_box[:, 1] - scaled_rf_sz[0] / 2.
+    im_rf_box[:, 2] = im_patch_box[:, 2] + scaled_rf_sz[1] / 2.
+    im_rf_box[:, 3] = im_patch_box[:, 3] + scaled_rf_sz[0] / 2.
+
+    # should not be required!!
+    # im_rf_box = clip_boxes(im_rf_box, im_sz)
+
+    return np.round(im_rf_box)
+
+
+def compute_bbox_grids(map_shape, im_shape, arch_type='alexnet'):
+    # stride, offset and receptive field [H, W] come from external excel spreadsheet calculation
+    if arch_type is 'alexnet':
+        pyra = {'stride': 32, 'num_levels': 1, 'scales': np.array([1.0]),
+                'padx': 0, 'pady': 0, 'offset': 113, 'rf_size': [227, 227]}  # [227 rf] 195, 227
+    else:
+        pyra = {'stride': 32, 'num_levels': 1, 'scales': np.array([1.0]),
+                'padx': 0, 'pady': 0, 'offset': 112, 'rf_size': [224, 224]}  # [227 rf] 195, 227
+
+        # pyra = {'stride': 16, 'num_levels': 1, 'scales': np.array([1.0]),
+        #         'padx': 0, 'pady': 0, 'offset': 112, 'rf_size': [224, 224]}  # [227 rf] 195, 227
+
+    # blobs in caffe and images in opencv are H,W formatted. This results in YX format
+    # since all bounding boxes use XY convention, then accessing images or blobs this needs to be taken into account
+    x = np.arange(0, map_shape[1])
+    y = np.arange(0, map_shape[0])
+    xv, yv = np.meshgrid(x, y, sparse=False, indexing='xy')
+
+    # print lbl_ind.shape, im.shape, xv.shape
+
+    # compute basic patch boxes
+    # each score in the score map corresponds to a single non-overlapping box
+    patch_boxes = _pyramid_patch_box(xv.flatten(), yv.flatten(), map_shape, pyra, 0, opt='A') + pyra[
+        'offset']
+
+    # compute receptive field sized boxes (overlapping)
+    # due to the way pyramid_rf_box is implemented one stride needs to be subtracted from rf_size
+    rf_sz = np.array(pyra['rf_size'], dtype=np.int)
+    rf_boxes = _pyramid_rf_box(im_shape, patch_boxes, rf_sz - pyra['stride'], pyra['scales'], 0)
+
+    return patch_boxes, rf_boxes
+
+
+def label_map2image(feat_x, feat_y, map_shape, arch_type='alexnet'):
+    # stride, offset and receptive field [H, W] come from external excel spreadsheet calculation
+    if arch_type is 'alexnet':
+        pyra = {'stride': 32, 'num_levels': 1, 'scales': np.array([1.0]),
+                'padx': 0, 'pady': 0, 'offset': 113, 'rf_size': [227, 227]}  # [227 rf] 195, 227
+    else:
+        pyra = {'stride': 32, 'num_levels': 1, 'scales': np.array([1.0]),
+                'padx': 0, 'pady': 0, 'offset': 112, 'rf_size': [224, 224]}  # [227 rf] 195, 227
+
+    # compute basic patch boxes
+    # each score in the score map corresponds to a single non-overlapping box
+    patch_boxes = _pyramid_patch_box(feat_x, feat_y, map_shape, pyra, 0, opt='A') + pyra[
+        'offset']
+
+    return patch_boxes
+
+
+def radius_in_image(feat_radius, dim=0, arch_type='alexnet'):
+    # dim defines along which dimension to compute (only important, if rf_size not square)
+
+    # stride, offset and receptive field [H, W] come from external excel spreadsheet calculation
+    if arch_type is 'alexnet':
+        pyra = {'stride': 32, 'num_levels': 1, 'scales': np.array([1.0]),
+                'padx': 0, 'pady': 0, 'offset': 113, 'rf_size': [227, 227]}  # [227 rf] 195, 227
+    else:
+        pyra = {'stride': 32, 'num_levels': 1, 'scales': np.array([1.0]),
+                'padx': 0, 'pady': 0, 'offset': 112, 'rf_size': [224, 224]}  # [227 rf] 195, 227
+
+    rf_sz = np.array(pyra['rf_size'], dtype=np.int)
+
+    # compute radii in image
+    patch_radius = feat_radius * pyra['stride']
+    rf_radius = patch_radius + (rf_sz[dim] - pyra['stride'])
+    return patch_radius, rf_radius
+
+
+def coord_in_image(coord, add_rf=False, arch_type='alexnet'):
+    # stride, offset and receptive field [H, W] come from external excel spreadsheet calculation
+    if arch_type is 'alexnet':
+        pyra = {'stride': 32, 'num_levels': 1, 'scales': np.array([1.0]),
+                'padx': 0, 'pady': 0, 'offset': 113, 'rf_size': [227, 227]}  # [227 rf] 195, 227
+    else:
+        pyra = {'stride': 32, 'num_levels': 1, 'scales': np.array([1.0]),
+                'padx': 0, 'pady': 0, 'offset': 112, 'rf_size': [224, 224]}  # [227 rf] 195, 227
+
+    rf_sz = np.array(pyra['rf_size'], dtype=np.int)
+
+    # compute coordinate in image
+    im_coord = coord * pyra['stride'] + pyra['offset']
+    if add_rf:
+        im_coord += (rf_sz[0] - pyra['stride'])
+    return im_coord
+
+
+def _bbox_transform_inv(boxes, deltas):
+    if boxes.shape[0] == 0:
+        return np.zeros((0, deltas.shape[1]), dtype=deltas.dtype)
+
+    boxes = boxes.astype(deltas.dtype, copy=False)
+
+    widths = boxes[:, 2] - boxes[:, 0] + 1.0
+    heights = boxes[:, 3] - boxes[:, 1] + 1.0
+    ctr_x = boxes[:, 0] + 0.5 * widths
+    ctr_y = boxes[:, 1] + 0.5 * heights
+
+    dx = deltas[:, 0::4]
+    dy = deltas[:, 1::4]
+    dw = deltas[:, 2::4]
+    dh = deltas[:, 3::4]
+
+    pred_ctr_x = dx * widths[:, np.newaxis] + ctr_x[:, np.newaxis]
+    pred_ctr_y = dy * heights[:, np.newaxis] + ctr_y[:, np.newaxis]
+    pred_w = np.exp(dw) * widths[:, np.newaxis]
+    pred_h = np.exp(dh) * heights[:, np.newaxis]
+
+    pred_boxes = np.zeros(deltas.shape, dtype=deltas.dtype)
+    # x1
+    pred_boxes[:, 0::4] = pred_ctr_x - 0.5 * pred_w
+    # y1
+    pred_boxes[:, 1::4] = pred_ctr_y - 0.5 * pred_h
+    # x2
+    pred_boxes[:, 2::4] = pred_ctr_x + 0.5 * pred_w
+    # y2
+    pred_boxes[:, 3::4] = pred_ctr_y + 0.5 * pred_h
+
+    return pred_boxes
+
+
+def nms(dets, thresh):
+    x1 = dets[:, 0]
+    y1 = dets[:, 1]
+    x2 = dets[:, 2]
+    y2 = dets[:, 3]
+    scores = dets[:, 4]
+
+    areas = (x2 - x1 + 1) * (y2 - y1 + 1)
+    order = scores.argsort()[::-1]
+
+    keep = []
+    while order.size > 0:
+        i = order[0]
+        keep.append(i)
+        xx1 = np.maximum(x1[i], x1[order[1:]])
+        yy1 = np.maximum(y1[i], y1[order[1:]])
+        xx2 = np.minimum(x2[i], x2[order[1:]])
+        yy2 = np.minimum(y2[i], y2[order[1:]])
+
+        w = np.maximum(0.0, xx2 - xx1 + 1)
+        h = np.maximum(0.0, yy2 - yy1 + 1)
+        inter = w * h
+        ovr = inter / (areas[i] + areas[order[1:]] - inter)
+
+        inds = np.where(ovr <= thresh)[0]
+        order = order[inds + 1]
+
+    return keep
+
+
+def crop_bboxes_from_im(im, bboxes, context_pad=0, is_pil=False):
+    """
+    Crop a bbox from the image for detection.
+    im: crop target
+    bboxes: bounding box coordinates as xmin, ymin, xmax, ymax.
+    """
+    # iterate over boxes
+    im_crop_list = []
+    for i in xrange(bboxes.shape[0]):
+        # format bbox
+        bbox = np.round(bboxes[i, :]).astype(int)
+        # crop bbox from image
+        if context_pad <= 0:
+            im_crop = im[bbox[1]:bbox[3], bbox[0]:bbox[2]]
+        else:
+            if is_pil:
+                im_crop = np.asarray(crop_pil_image(im, bbox, context_pad=context_pad)[0])
+            else:
+                im_crop = np.asarray(crop_pil_image(Image.fromarray(im), bbox, context_pad=context_pad)[0])  # Image.fromarray(im, 'L')
+        # append to list
+        im_crop_list.append(im_crop)
+    return im_crop_list
+
+
+def apply_bbox_regression(predicted_roi, rf_boxes, im_shape, num_classes=None, with_star_crop=True):
+    if num_classes is None:
+        num_classes = cfg.TEST.NUM_CLASSES
+
+    ## if use_bbox_reg:
+    # select roi deltas
+    if with_star_crop:
+        roi_deltas = predicted_roi[4, ...].reshape([num_classes * 4, -1]).transpose()
+    else:
+        roi_deltas = predicted_roi.reshape([num_classes * 4, -1]).transpose()
+    # apply bounding-box regression deltas
+    pred_boxes = _bbox_transform_inv(rf_boxes, roi_deltas)
+    # make sure everything stays inside its limits
+    pred_boxes = clip_boxes(pred_boxes, im_shape)
+    return pred_boxes
+
+
+def _split_detections(detections, boxes, axis=1, nsplits=2, sid=1):
+    assert (axis == 1) | (axis == 2)
+    boxes_split = np.array_split(boxes, nsplits, axis=axis-1)
+    dets_split = np.array_split(detections, nsplits, axis=axis)
+    # reshape to original format
+    det_vec = dets_split[sid].reshape([dets_split[sid].shape[0], -1]).transpose()
+    box_vec = boxes_split[sid].reshape([-1, boxes_split[sid].shape[-1]])
+    return det_vec, box_vec
+
+
+def split_detections(detections, pred_boxes, rf_boxes, lbl_map_shape,
+                     split_axis='h', nsplits=2, sid=1, num_classes=cfg.TEST.NUM_CLASSES):
+    if split_axis == 'h':
+        # horizontal
+        axis = 1
+    elif split_axis == 'v':
+        # vertical
+        axis = 2
+    # split detections
+    if cfg.TEST.BBOX_REG:
+        det_vec, box_vec = _split_detections(detections,
+                                             pred_boxes.reshape(list(lbl_map_shape) + [4 * num_classes]),
+                                             axis=axis, nsplits=nsplits, sid=sid)
+    else:
+        det_vec, box_vec = _split_detections(detections,
+                                             rf_boxes.reshape(list(lbl_map_shape) + [num_classes]),
+                                             axis=axis, nsplits=nsplits, sid=sid)
+
+    return det_vec, box_vec
+
+
+def vis_detections(im, bboxes, scores=None, labels=None, thresh=0.3, max_vis=20, figs_sz=(14, 14), ax=None):
+    """
+    Visualize bounding boxes on top of input image including labels / scores.
+    im: input image
+    bboxes: ndarray of bounding boxes
+    scores: list of scores with length equal bboxes.shape[0]
+    labels: list of integer labels with length equal bboxes.shape[0]
+    etc.
+    """
+    if scores is None:
+        nvis = min(max_vis, bboxes.shape[0])
+    else:
+        assert len(scores) == bboxes.shape[0]
+        inds = np.where(scores > thresh)[0]
+        nvis = min(max_vis, len(inds))
+
+    # return if no bboxes to visualize
+    if nvis == 0:
+        return
+
+    # plot base figure
+    if ax is None:
+        fig, ax = plt.subplots(1, 1, figsize=figs_sz)
+
+    ax.imshow(im, cmap=cm.Greys_r)
+
+    # iterate over bboxes and add them
+    for i in xrange(nvis):
+        bbox = bboxes[i, :4]
+        # deal with scores
+        if scores is not None:
+            score = scores[i]
+            # only show boxes with score above threshold
+            if score <= thresh:
+                continue
+        # deal with labels
+        if isinstance(labels, str):
+            # if label is string
+            cls_name = labels
+            title_txt = labels
+        else:
+            # else assume index array
+            assert len(labels) == bboxes.shape[0]
+            cls_name = '{:.0f}'.format(labels[i])
+            title_txt = 'X'
+
+        # plt.cla()
+        # plt.imshow(im, cmap = cm.Greys_r)
+        ax.add_patch(
+            plt.Rectangle((bbox[0], bbox[1]),
+                          bbox[2] - bbox[0],
+                          bbox[3] - bbox[1], fill=False,
+                          edgecolor='blue', alpha=0.5, linewidth=2.0)
+        )
+        if scores is None:
+            ax.text(bbox[0], bbox[1] - 2, '{:s}'.format(cls_name),
+                      bbox=dict(facecolor='blue', alpha=0.4), fontsize=8, color='white')
+            ax.set_title('{}'.format(cls_name))
+        else:
+            ax.text(bbox[0], bbox[1] - 2, '{:s} {:.2f}'.format(cls_name, score),
+                      bbox=dict(facecolor='blue', alpha=0.4), fontsize=8, color='white')
+            ax.set_title('{}  {:.3f}'.format(cls_name, score))
+        ax.set_title('{} detections with p({} | box) >= {:.2f}'.format(nvis, title_txt, thresh), fontsize=14)
+    ax.xaxis.set_major_locator(ticker.MultipleLocator(200))
+    ax.yaxis.set_major_locator(ticker.MultipleLocator(200))
+    # plt.axis('off')
+    # plt.tight_layout()
+    # plt.draw()
+
+
+def scale_detection_boxes(boxes, scale_factor):
+    # scale boxes depending on scale factor
+    return boxes * scale_factor
+
+
+def correct_for_shift(boxes, correction):
+    # correct shift due to oversampling
+    # in order to correct ground truth boxes, e.g. center crop -> subtract half the shift from gt boxes
+    # in order to correct detection boxes, e.g. center crop -> add half the shift to detection boxes
+    return boxes + correction
+
+
+def reverse_scaling(rf_boxes, pred_boxes, scaling=1):
+    # if used, should be applied right before post-processing detections
+
+    # reverse scaling of detection boxes
+    rf_boxes = scale_detection_boxes(rf_boxes, scaling)
+    pred_boxes = scale_detection_boxes(np.array(pred_boxes), scaling)
+
+    return rf_boxes, pred_boxes
+
+
+def reverse_shift_and_scaling(rf_boxes, pred_boxes, shift=0, scaling=1):
+    # if used, should be applied right before post-processing detections
+
+    # correct shift of detection boxes due to center crop
+    rf_boxes = correct_for_shift(rf_boxes, shift)
+    pred_boxes = correct_for_shift(np.array(pred_boxes), shift)
+
+    # reverse scaling of detection boxes
+    rf_boxes = scale_detection_boxes(rf_boxes, scaling)
+    pred_boxes = scale_detection_boxes(np.array(pred_boxes), scaling)
+
+    return rf_boxes, pred_boxes
+
+
+def post_process_detections(scores, pred_boxes, rf_boxes, num_classes=None, use_bbox_reg=None, nms_thresh=None):
+    # apply nms and filter low confidence boxes
+    # return list of good candidates
+    # all detections are collected into:
+    #    all_boxes[cls][image] = N x 5 array of detections in
+    #    (x1, y1, x2, y2, score)
+    if num_classes is None:
+        num_classes = cfg.TEST.NUM_CLASSES
+    if use_bbox_reg is None:
+        use_bbox_reg = cfg.TEST.BBOX_REG
+    if nms_thresh is None:
+        nms_thresh = cfg.TEST.NMS
+
+    score_min_thresh = cfg.TEST.SCORE_MIN_THRESH
+    score_bg_thresh = cfg.TEST.SCORE_BG_THRESH
+
+    num_images = 1
+
+    all_boxes = [[[] for _ in range(num_images)]
+                 for _ in range(num_classes)]  # xrange vs range
+
+    for i in range(num_images):  # xrange vs range
+        # load image and get detections from network
+        # [.....]
+        # skip j = 0, because it's the background class
+        for j in range(1, num_classes):  # xrange vs range
+            # selection of boxes before NMS
+            inds = np.where((scores[:, j] > score_min_thresh) & (scores[:, 0] < score_bg_thresh))[0]
+            cls_scores = scores[inds, j]
+            if use_bbox_reg:
+                cls_boxes = pred_boxes[inds, j * 4:(j + 1) * 4]  # bbox regression
+            else:
+                cls_boxes = rf_boxes[inds, :]  # without bbox regression
+            cls_dets = np.hstack((cls_boxes, cls_scores[:, np.newaxis])) \
+                .astype(np.float32, copy=False)
+            # apply nms suppression
+            keep = nms(cls_dets, nms_thresh)
+            cls_dets = cls_dets[keep, :]
+            all_boxes[j][i] = cls_dets
+
+    return all_boxes
+
+
+def get_all_bboxes(all_boxes):
+    # take detections and all_boxes
+    # return enriched list of detections including bbox, score, and max label
+    num_classes = len(all_boxes)
+    dets_list = [[] for _ in xrange(num_classes)]
+    for j in xrange(1, num_classes):
+        if len(all_boxes[j][0]) > 0:
+            # get boxes
+            BB = all_boxes[j][0]
+            confidence = all_boxes[j][0][:, -1]
+            # sort boxes by confidence
+            sorted_ind = np.argsort(-confidence)
+            # sorted_scores = np.sort(-confidence)
+            BB = BB[sorted_ind, :]
+            # append together with class label
+            dets_list[j] = np.concatenate([BB, np.tile(j, reps=(BB.shape[0], 1))], axis=1)
+    # concatenate lists from different classes
+    return dets_list
+
+
+def get_detection_bboxes(detections, all_boxes):
+    # take detections and all_boxes
+    # return enriched list of detections including bbox, score, and max label
+    num_classes = len(all_boxes)
+    dets_list = [[] for _ in xrange(num_classes)]
+    for j in xrange(1, num_classes):
+        if len(detections[j][0]) > 0:
+            # get boxes
+            BB = all_boxes[j][0]
+            confidence = all_boxes[j][0][:, -1]
+            # sort boxes by confidence
+            sorted_ind = np.argsort(-confidence)
+            # sorted_scores = np.sort(-confidence)
+            BB = BB[sorted_ind, :]
+            # select detections (indices require sorted detections - since sorted in evaluate)
+            inds = detections[j][0]
+            # append together with class label
+            dets_list[j] = np.concatenate([BB[inds, :], np.tile(j, reps=(inds.shape[0], 1))], axis=1)
+    # concatenate lists from different classes
+    return dets_list
+
+
+def collect_detection_crops(input_im, dets_list, max_vis=5, context_pad=0):
+    # take tablet(input_im) and list of bboxes(dets_list)
+    # return cropped patches(dets_crops)
+    num_classes = len(dets_list)
+    dets_crops = [[] for _ in xrange(num_classes)]
+    for j in xrange(1, num_classes):
+        if len(dets_list[j]) > 0:
+            # get boxes
+            cls_dets = dets_list[j]  # select class list
+            bboxes = cls_dets[:, :4]  # remove any additional dims
+            ncrops = min(max_vis, bboxes.shape[0])
+            dets_crops[j] = crop_bboxes_from_im(input_im, bboxes[:ncrops, ...], context_pad)
+    return dets_crops
+
+
+def plot_crop_list(dets_crops, gt_crops, scores=None, k=8, cls_label='', figs_sz=(14, 4.5), context_pad=0):
+    # plot co-detections of a single class
+    # can handle dets_crops and gt_crops together or both on their own
+    nvis = min(len(dets_crops), k)
+    ngt = len(gt_crops)
+    if nvis > 0:
+        # slice crops and scores
+        top_list = dets_crops[:nvis]
+        top_vals = scores[:nvis]
+
+        # prepare subplots (nvis or nvis + 1)
+        fig, axes = plt.subplots(1, nvis + (ngt > 0), figsize=figs_sz, squeeze=False)  # , gridspec_kw={'wspace': 1}
+        axes = axes.ravel()
+
+        # plot idx
+        pid = 0
+
+        # plot ground truth in front if available
+        if ngt > 0:
+            axes[pid].imshow(gt_crops[0], cmap=cm.Greys_r)
+            axes[pid].set_yticks([])
+            axes[pid].set_xticks([])
+            axes[pid].set_title("gt [{}]".format(cls_label))
+            pid += 1
+
+            # iterate over top_list
+        for i, imcrop in enumerate(top_list):
+            axes[pid + i].imshow(imcrop, cmap=cm.Greys_r)
+            axes[pid + i].set_yticks([])
+            axes[pid + i].set_xticks([])
+            bbox_props = dict(boxstyle="round", fc="w", ec="0.5", alpha=0.8)
+            # if there is no gt, add class label to title in first plot
+            if pid + i == 0 and ngt == 0:
+                axes[pid + i].set_title("class [{}] #{} p(x)={:.1f}".format(cls_label, i + 1, top_vals[i]))
+            else:
+                axes[pid + i].set_title("#{} p(x)={:.1f}".format(i + 1, top_vals[i]))
+
+            if context_pad > 0:
+                imw, imh = imcrop.shape[:2]
+                bbox = [context_pad, context_pad, imh - context_pad, imw - context_pad]
+                axes[pid + i].add_patch(plt.Rectangle((bbox[0], bbox[1]),
+                                                      bbox[2] - bbox[0], bbox[3] - bbox[1],
+                                                      fill=False, edgecolor='blue', linestyle='-',
+                                                      alpha=0.3, linewidth=2.0))
+
+    elif ngt > 0:
+        nvis = 1
+        # plot top k
+        fig, axes = plt.subplots(1, nvis, figsize=figs_sz, squeeze=False)
+        axes = axes.ravel()
+
+        top_list = [gt_crops[0]] * nvis
+        top_vals = [1] * len(top_list)
+        for pid, imcrop in enumerate(top_list):
+            axes[0, pid].imshow(imcrop, cmap=cm.Greys_r)
+            bbox_props = dict(boxstyle="round", fc="w", ec="0.5", alpha=0.8)
+            axes[0, pid].set_title("gt [{}]".format(cls_label))
+            axes[0, pid].set_yticks([])
+            axes[0, pid].set_xticks([])
+
+
+def convert_detections_to_array(all_boxes, img_idx=0, idx_column=None):
+    #    all_boxes[cls][image] = N x 5 array of detections in
+    #    (x1, y1, x2, y2, score)
+    total_labels = len(all_boxes)  # all_boxes.shape[0]
+    temp = [0, 0, 0, 0, 0, 0, 0, 0, 0]  # [ID, cx, cy, score, x1, y1, x2, y2, idx]
+    detections_arr = np.zeros((0, 9))
+    idx = 0
+    # convert to CLS, cx, cy, score
+    for i in range(total_labels):
+        for box in all_boxes[i][img_idx]:
+            temp[0] = i
+            temp[1] = (box[2] + box[0]) / 2
+            temp[2] = (box[3] + box[1]) / 2
+            temp[3] = box[4]
+            temp[4:8] = box[0:4]
+            if idx_column is None:
+                temp[8] = idx
+            else:
+                temp[8] = idx_column[idx]
+            idx += 1
+            detections_arr = np.vstack((detections_arr, temp))
+    # SORT BY SCORE!?
+    return detections_arr
+
@@ -0,0 +1,519 @@
+import numpy as np
+import pandas as pd
+
+import torch
+from torchvision import transforms as trafos
+
+from skimage import draw
+from skimage.color import label2rgb
+from skimage.transform import hough_line, hough_line_peaks, probabilistic_hough_line
+from skimage.morphology import skeletonize, skeletonize_3d, thin, medial_axis, watershed
+
+from scipy import ndimage as ndi
+from scipy.spatial.distance import pdist, cdist, squareform
+
+from ..detection.detection_helpers import label_map2image, coord_in_image
+
+
+# prepare input for line detection
+
+def preprocess_line_input(pil_im, scale, shift=None):
+    """ produces five copies of the segment at slightly different offsets
+
+    :param pil_im: tablet segment that is to be processed
+    :param scale: scale which should be used for resizing
+    :param shift: offset shift used to produce five-fold oversampling
+    :return: 4D tensor with 5xCxWxH
+    """
+    if shift is None:
+        shift = 0  # cfg.TEST.SHIFT
+    # compute scaled size
+    imw, imh = pil_im.size
+    imw = int(imw * scale)
+    imh = int(imh * scale)
+    # determine crop size
+    crop_sz = [int(imw - shift), int(imh - shift)]
+    # tensor-space transforms
+    ts_transform = trafos.Compose([
+        trafos.ToTensor(),
+        trafos.Normalize(mean=[0.5], std=[1]), # normalize
+    ])
+    # compose transforms
+    tablet_transform = trafos.Compose([
+            trafos.Lambda(lambda x: x.convert('L')), # convert to gray
+            trafos.Resize((imh, imw)), # resize according to scale
+            trafos.FiveCrop((crop_sz[1], crop_sz[0])), # oversample
+            trafos.Lambda(
+                lambda crops: torch.stack([ts_transform(crop) for crop in crops])), # returns a 4D tensor
+        ])
+    # apply transforms
+    im_list = tablet_transform(pil_im)
+    return im_list
+
+
+def apply_detector(inputs, model_fcn, device):
+
+    with torch.no_grad():  # faster, less memory usage
+        inputs = inputs.to(device)
+        # apply network
+        output = model_fcn(inputs)
+        # convert to numpy
+        output = output.data.cpu().numpy()
+        return output
+
+
+# prepare transliteration for line detection
+
+def prepare_transliteration(tl_df, num_lines, stats):
+    """
+    ATTENTION: this filters the transliteration according to status!
+    """
+
+    # prepare transliteration for line detection
+    if num_lines > 0:
+        # only visible/not broken
+        tl_df = tl_df[tl_df.status > 0]
+        # compute line length
+        tl_df = tl_df.groupby('line_idx').apply(compute_line_length_from_tl, stats)
+        # get line statistics
+        num_vis_lines = tl_df.line_idx.nunique()  # num visible lines (not broken lines)
+        # len_lines = tl_df.groupby('line_idx').pos_idx.count()
+        len_lines = tl_df.line_idx.value_counts()
+        len_min, len_max = len_lines.min(), len_lines.max()
+    else:
+        # TODO: if no tl info available, use initial line detection results to set these parameters
+        len_min, len_max = 4, 12
+        num_vis_lines = 40
+
+    return tl_df, num_vis_lines, len_min, len_max
+
+
+# extract lines with hough transform
+
+def compute_hough_transform(line_det_map1, line_det_map2, re_focus_angle=True):
+    # focus theta for cuneiform horizontal lines
+    theta_range = np.linspace(np.deg2rad(83), np.deg2rad(97), 50)
+    # theta_range = np.linspace(np.deg2rad(-90) ,np.deg2rad(90), 180) # normal range
+
+    # Classic straight-line Hough transform (usually angles from -90 to +90)
+    h, theta, d = hough_line(line_det_map1, theta=theta_range)
+
+    # debug
+    # plt.imshow(np.log(1 + h), extent=[np.rad2deg(theta[-1]), np.rad2deg(theta[0]), d[-1], d[0]], cmap='gray', aspect=1/1.5)
+    # plt.show()
+
+    # focus angle and re-run
+    if re_focus_angle:
+        # get peaks
+        accum, angles, dists = hough_line_peaks(h, theta, d, min_distance=1, min_angle=16, num_peaks=50)
+
+        # get median angle
+        m_angle = np.median(np.rad2deg(angles))
+        # modify theta
+        theta_range = np.linspace(np.deg2rad(m_angle - 2), np.deg2rad(m_angle + 2), 50)
+        theta_range2 = np.linspace(np.deg2rad(m_angle - 3), np.deg2rad(m_angle + 3), 50)
+        # Classic straight-line Hough transform (usually angles from -90 to +90)
+        h, theta, d = hough_line(line_det_map2, theta=theta_range)
+
+    return h, theta, d, theta_range, theta_range2
+
+
+# group lines together that are "close"
+
+def shoelace_formula(points):
+    ''' compute are of polygon according to shoelace
+    requires ordering of point coordinates
+    https://en.wikipedia.org/wiki/Shoelace_formula
+
+    :param points: 2xn matrix, where n is number of points (points need to be ordered!!)
+    :return: area of polygon
+    '''
+    area = 0
+    dmat = np.ones((2, 2))
+    for i in range(points.shape[1]):
+        dmat[:, 0] = points[:, i]
+        dmat[:, 1] = points[:, (i+1) % points.shape[1]]
+        area += np.linalg.det(dmat.transpose())
+    return np.abs(area) / 2.
+
+
+def area_between_two_line_segments(spt1, spt2, lpt1, lpt2):
+    # compute area between line segments
+    # assume: line segments do not intersect and
+    # assume: pts should be order according to x-axis
+    # this means a valid order would be [spt1, spt2, lpt2, lpt1]
+    return shoelace_formula(np.stack([spt1, spt2, lpt2, lpt1], axis=1))
+
+
+def nearby_and_near_parallel_2(l1, l2, interline_distance, interval=[0, 10]):
+    # compute area between line segments over interval
+    angle1, rad1 = l1
+    angle2, rad2 = l2
+    spt1, spt2 = line_pts_from_polar_line(angle1, rad1, x0=interval[0], x1=interval[1])
+    lpt1, lpt2 = line_pts_from_polar_line(angle2, rad2, x0=interval[0], x1=interval[1])
+    # use shoelace method
+    area = area_between_two_line_segments(spt1, spt2, lpt1, lpt2)
+    # check threshold
+    interval_interline_area = interline_distance * np.abs(interval[1] - interval[0])
+
+    # print area, interval_interline_area / 2.
+    if area < interval_interline_area / 2.:
+        return True
+    else:
+        return False
+
+
+def nearby_and_near_parallel(l1, l2, interline_distance):
+    # simple filter
+    angle1, rad1 = l1
+    angle2, rad2 = l2
+    if np.abs(rad1 - rad2) < interline_distance/2. and np.abs(np.rad2deg(angle1-angle2)) < 1.0:
+        return True
+    else:
+        return False
+
+
+def do_intersect_in_interval(l1, l2, interval):
+    # y = mx+c or in parametric form
+    # \rho = x \cos \theta + y \sin \theta
+    # \rho (radius) perpendicular distance from origin to the line
+    # \theta is the angle formed by this perpendicular line
+
+    angle1, rad1 = l1
+    angle2, rad2 = l2
+    lower, upper = interval
+
+    quotient = (np.cos(angle1) - np.cos(angle2))
+
+    if quotient == 0:  # same angles
+        if np.abs(rad1 - rad2) < 3:  # same radius
+            return True
+        else:
+            return False
+    else:
+        # compute intersection coordinate
+        x_intersect = (rad1 - rad2) / quotient
+
+        # inside interval
+        if (x_intersect >= lower) and (x_intersect <= upper):
+            return True
+        else:
+            return False
+
+
+def compute_group_labels_from_dists(X_dist):
+    # assign labels to groups
+    # iterate over pairwise distances and
+
+    # get squareform
+    XX = squareform(X_dist)
+    # set dummy labels
+    labels = -np.ones(XX.shape[0])
+    # label lines while checking for neighbourhood
+    for ii in range(len(labels)):
+        if labels[ii] == -1:
+            labels[ii] = ii
+        # for each row in squareform indicates potential neighbors
+        for idx in np.where(XX[ii, :] > 0)[0]:
+            labels[idx] = labels[ii]
+    return labels
+
+
+# associate lines with line segments
+
+
+def line_pts_from_polar_line(angle, dist, x0=0, x1=10):
+    # computes two points defining a line from polar line representation
+    x0, x1 = x0 * np.ones_like(angle), x1 * np.ones_like(angle)  # x0 = np.zeros_like(angle)
+    y0 = (dist - x0 * np.cos(angle)) / np.sin(angle)
+    y1 = (dist - x1 * np.cos(angle)) / np.sin(angle)
+    return (x0, y0), (x1, y1)
+
+
+def line_params_from_pts(lpt1, lpt2):
+    # compute parameters a, b for line representation y = a * x + b
+    a = (lpt2[1] - lpt1[1])/float(lpt2[0] - lpt1[0])
+    b = lpt1[1] - lpt1[0] * a
+    return a, b
+
+
+def normal_form_from_pts(p, q):
+    # takes two points an computes
+    # https://de.wikipedia.org/wiki/Normalenform#Aus_der_Zweipunkteform
+
+    # normal
+    n = np.array([-(q[1]-p[1]),
+                  q[0]-p[0]], dtype=float)
+    # normalize
+    n_0 = n / np.linalg.norm(n)
+    # distance from origin
+    dist = np.dot(n_0, p)
+    return n_0, dist
+
+
+def hess_normal_form_from_pts(p, q):
+    n_0, dist = normal_form_from_pts(p, q)
+    # angle in rad
+    rad = np.arctan(n_0[1]/n_0[0])
+    return rad, dist
+
+
+def _offset_pt_to_normal_form_line(pt, n_0, dist):
+    return np.dot(n_0, pt) - dist
+
+
+def _shift_pt_to_normal_form_line(pt, n_0, shift_dist):
+    return pt + n_0 * shift_dist
+
+
+def clip_pt_using_normal_form(pt, n_0, dist, min_dist):
+    # compute offset
+    offset_line = _offset_pt_to_normal_form_line(pt, n_0, dist)
+    # check if correction necessary
+    if np.abs(offset_line) > min_dist:
+        # compute correction
+        if offset_line >= 0:
+            correction = offset_line - min_dist
+        else:
+            correction = offset_line + min_dist
+        # apply correction
+        pt = _shift_pt_to_normal_form_line(pt, n_0, -correction)
+    return pt
+
+
+def clip_bbox_using_line(bbox, line_pts_arr, min_dist=128/2.):
+    # get normal form of line
+    n_0, dist = normal_form_from_pts(line_pts_arr[0], line_pts_arr[1])
+    # compute distance to line and decide if pt needs to be shifted
+    pt_list = []
+    # iterate over two bounding box coordinates
+    for pt in [bbox[:2], bbox[2:]]:
+        pt = clip_pt_using_normal_form(pt, n_0, dist, min_dist)
+        pt_list.append(pt)
+
+    return np.concatenate(pt_list)
+
+
+def clip_bbox_using_line_segmentation(bbox, line_pts_arr, skeleton, min_dist=128/2.):
+    # get normal form of line
+    n_0, dist = normal_form_from_pts(line_pts_arr[0], line_pts_arr[1])
+
+    # use bbox boundaries to crop pts from line segmentation
+    # seg_line_pts = np.nonzero(skeleton[:, int(bbox[0]):int(bbox[2])])[0]
+    # faster but more exclusive (probably worth the speedup)
+    seg_line_pts = np.nonzero(skeleton[int(bbox[1]):int(bbox[3]), int(bbox[0]):int(bbox[2])])[0] + int(bbox[1])
+
+    dist_delta = 0
+    if len(seg_line_pts) > 3:
+        # compute average y location of segmentation line pts
+        seg_line_cy = np.mean(seg_line_pts)
+        # determine local distance delta from linear model to skeleton
+        pt = [(bbox[0] + bbox[2]) / 2., seg_line_cy]
+        dist_delta = _offset_pt_to_normal_form_line(pt, n_0, dist)
+        # correct normal form of line [n_0, dist] using delta, ie. alter dist
+        dist = dist + dist_delta
+    #print dist_delta, dist - dist_delta, dist
+
+    # compute distance to line and decide if pt needs to be shifted
+    pt_list = []
+    # iterate over two bounding box coordinates
+    for pt in [bbox[:2], bbox[2:]]:
+        pt = clip_pt_using_normal_form(pt, n_0, dist, min_dist)
+        pt_list.append(pt)
+
+    return np.concatenate(pt_list)
+
+
+def dist_pt_line(pt, lpt1, lpt2):
+    # compute squared 'perpendicular distance'
+    # pt is point
+    # lpt are line points
+    # returns minimum (perpendicular) distance from point to line
+
+    # assumes line representation of form y = a * x + b
+    (a, b) = line_params_from_pts(lpt1, lpt2)
+    # from: energy based geometric model fitting (2010)
+    # https://en.wikipedia.org/wiki/Distance_from_a_point_to_a_line#Another_formula
+    return (np.abs(pt[1]-a*pt[0]-b)/np.sqrt(a**2+1))**2
+
+
+def dist_lineseg_line(spt1, spt2, lpt1, lpt2):
+    # computes the distance between line (unbounded) and line segment (bounded)
+    # spt are line segment points
+    # lpt are line points
+    # returns minimum distance from line segment to line
+    return min(dist_pt_line(spt1, lpt1, lpt2),
+               dist_pt_line(spt2, lpt1, lpt2))
+
+
+def assign_line_segments_to_lines(line_segs, line_hypos, x1=10):
+    # get line pts from polar lines
+    polar_lines = line_hypos.groupby('label').mean()[['angle', 'dist']].values
+    line_pts = line_pts_from_polar_line(polar_lines[:, 0], polar_lines[:, 1], x1=x1)
+    line_pts = np.transpose(np.concatenate(line_pts))
+    # get line segments
+    line_seg_pts = np.stack(line_segs).reshape(len(line_segs), -1)
+
+    # compute distance between line segments and lines
+    X2_dist = cdist(line_pts, line_seg_pts,
+                    lambda lpts, spts: dist_lineseg_line(spts[:2], spts[2:], lpts[:2], lpts[2:]))
+    # assign line segments to nearest line
+    ls_labels = np.argmin(X2_dist, axis=0)
+
+    return ls_labels
+
+
+# associate line segments with segments
+
+
+def associate_segments_with_lines(lbl_ind, line_segs, ls_labels, group2line):
+    # create markers from line segments
+    im_marker = np.zeros_like(lbl_ind)
+    for line, li in zip(line_segs, ls_labels):
+        p0, p1 = line
+        rr, cc = draw.line(p0[1], p0[0], p1[1], p1[0])
+        im_marker[rr, cc] = int(group2line[li]) + 1  # avoid background class
+    # plt.imshow(im_marker)
+
+    # use water shed to assign labels to segments
+    distance = ndi.distance_transform_edt(lbl_ind)
+    segm_labels = watershed(-distance, im_marker, mask=lbl_ind)
+    return segm_labels, im_marker
+
+
+# map segment lbls to image resolution (deal with network architecture with offset)
+
+def compute_image_label_map(segm_labels, image_shape, padding=0):
+    # collect patch boxes and their labels
+    list_patch_boxes, list_patch_labels = [], []
+    for lbl_idx in np.unique(segm_labels):
+        if lbl_idx > 0:
+            # for index compute coordinate boxes
+            vx, vy = np.where(segm_labels == lbl_idx)
+            patch_boxes = label_map2image(vy, vx, segm_labels.shape[::-1]).astype(int)
+            # append
+            list_patch_boxes.append(patch_boxes)
+            list_patch_labels.append(patch_boxes.shape[0] * [lbl_idx])
+            # vis_detections(center_im, patch_boxes, max_vis=200, labels="")
+
+    patch_boxes = np.concatenate(list_patch_boxes, axis=0)
+    patch_labels = np.concatenate(list_patch_labels, axis=0)
+    # vis_detections(center_im, np.concatenate(list_patch_boxes, axis=0) , max_vis=1000, labels="")
+
+    # create segmentation map from boxes and labels
+    seg_canvas = np.zeros(image_shape[:2])
+    for bb, lbl in zip(patch_boxes, patch_labels):
+        pad = padding
+        bb[:2] = bb[:2] - pad
+        bb[2:] = bb[2:] + pad
+        # print patch_box, patch_lbl
+        seg_canvas[bb[1]:bb[3], bb[0]:bb[2]] = lbl
+
+    return seg_canvas
+
+
+def compute_line_length_from_tl(group, stats, b=128.):  # 128 / (2 * 32)
+    # collect widths
+    widths = np.zeros(len(group))
+    for ii, (sidx, sign_rec) in enumerate(group.iterrows()):
+        widths[ii] = stats.get_sign_width(sign_rec.lbl, sign_width=1) * b
+    # compute offsets and line length
+    sign_xpos = widths.cumsum() - (widths / 2.)
+    line_len = widths.sum()
+    # add columns to group
+    group['prior_line_len'] = np.rint(line_len)
+    group.loc[group.index, 'prior_sign_xoff'] = np.rint(sign_xpos)
+    group.loc[group.index, 'prior_sign_width'] = np.rint(widths)
+
+    return group
+
+
+##### full pipeline
+
+
+def post_process_line_detections(lbl_ind_x, num_lines, len_min, len_max, verbose=True):
+    # identify lines and merge them if too close together
+    # line hypothesis are stored in line_hypos dataframe
+    # line_hypos.label indicates which lines are grouped together(merged) -> line_hypo_agg
+
+    # (0) perform skeletonization
+    skeleton = skeletonize(lbl_ind_x)
+    # skeleton = skeletonize_3d(lbl_ind)
+    # skeleton = thin(lbl_ind)
+
+    # (1) compute hough transform
+    h, theta, d, theta_range, theta_range2 = compute_hough_transform(skeleton, skeleton)  # skeleton, lbl_ind_x,
+
+    # (I) find peaks in hough transform
+    num_peaks_factor = 1.9  # 1.5 1.6  v007: 1.9 v047: 2.5 # line detector dependent (VIP)
+    hl_peak_threshold = (h.max() / float(len_max)) / 2. * len_min  # 2. # has impact on lenght of lines found
+    accums, angles, dists = hough_line_peaks(h, theta, d, min_distance=1, min_angle=14,
+                                             num_peaks=int(num_lines * num_peaks_factor),
+                                             threshold=hl_peak_threshold)
+
+    # ugly patch for hough_line_peaks shortcomings
+    # in rare cases len(accums) != len(angles) or len(dists
+    if len(accums) != len(angles):
+        angles = accums
+        dists = accums
+
+    # (II) check if lines intersect close to the center and group them accordingly
+    interval = [lbl_ind_x.shape[1] * 1 / 8., lbl_ind_x.shape[1] * 7 / 8.]
+    X_dist = pdist(np.stack([angles, dists], axis=1), lambda l1, l2: do_intersect_in_interval(l1, l2, interval))
+    labels = compute_group_labels_from_dists(X_dist).astype(int)
+    if verbose:
+        print('detected groups: {} | num lines: {}.'.format(len(np.unique(labels)), num_lines))
+    # collect lines in dataframe
+    line_hypos = pd.DataFrame({'accum': accums, 'angle': angles, 'dist': dists, 'label': labels})
+    line_hypos_agg = line_hypos.groupby('label').mean()
+    # add group diff column
+    diffs = line_hypos_agg.dist.sort_values().diff()
+    # compute interline median
+    dist_interline_median = diffs.median()
+
+    # (III) check if remaining groups are very close
+    X_dist = pdist(np.stack([line_hypos_agg.angle.values, line_hypos_agg.dist.values], axis=1),
+                   lambda l1, l2: nearby_and_near_parallel_2(l1, l2, dist_interline_median, interval))
+    updated_labels = compute_group_labels_from_dists(X_dist).astype(int)
+    # update dataframe and grouping
+    line_hypos.label.replace(to_replace=line_hypos_agg.index.values, value=updated_labels, inplace=True)
+
+    # (IV) re-group with updated labels and get line meta for later usage
+    line_hypos_agg = line_hypos.groupby('label').mean()
+    # group lines and remember index (needs to be here)
+    group2line = line_hypos_agg.index.values
+    # add column group dist diff
+    diffs = line_hypos_agg.dist.sort_values().diff()
+    diffs.name = 'group_diff'  # set name before join
+    line_hypos = line_hypos.join(diffs, on='label')
+    # add column group angle diff
+    angle_diff = line_hypos_agg.sort_values('dist').angle.apply(np.rad2deg).diff()
+    angle_diff.name = 'group_angle_diff'  # set name before join
+    line_hypos = line_hypos.join(angle_diff, on='label')
+    # compute interline median
+    dist_interline_median = diffs.median()
+    if verbose:
+        print('Update: detected groups: {} | num lines: {}.'.format(len(line_hypos_agg), num_lines))
+
+    # (V) label line detection segments according to line hypos
+    # compute probabilistic hough transform for lines
+    # line_segs = probabilistic_hough_line(skeleton, threshold=6, line_length=15,
+    #                             line_gap=6, theta=basic_theta)
+    line_length = 8  # v007: 8 v047: 5  # line detector dependent (VIP)
+    if len_max < line_length:
+        line_length = len_max
+    line_segs = probabilistic_hough_line(skeleton, threshold=6, line_length=line_length,
+                                         line_gap=6, theta=theta_range)
+    if len(line_segs) > 0:
+        # assign line segments to nearest line
+        ls_labels = assign_line_segments_to_lines(line_segs, line_hypos)
+
+        # associate segments with lines
+        segm_labels, im_marker = associate_segments_with_lines(lbl_ind_x, line_segs, ls_labels, group2line)
+    else:
+        segm_labels, ls_labels = lbl_ind_x, None  # set segm_labels so that line_frag gets a shape
+
+    return line_hypos, line_segs, segm_labels, ls_labels, dist_interline_median, group2line, h, theta, d, skeleton
+
+
+
@@ -0,0 +1,70 @@
+import numpy as np
+from tqdm import tqdm
+
+from ..datasets.cunei_dataset_segments import CuneiformSegments, get_segment_meta
+from ..detection.line_detection import (prepare_transliteration, preprocess_line_input, apply_detector)
+from ..utils.path_utils import make_folder
+
+from skimage.morphology import skeletonize
+
+
+def gen_line_detections(didx_list, dataset, saa_version, relative_path,
+                        line_model_version, model_fcn, re_transform, device,
+                        save_line_detections):
+
+    # for seg_im, seg_idx in dataset:
+    # iterate over segments
+    for didx in tqdm(didx_list, desc=saa_version):
+        # print(didx)
+        seg_im, gt_boxes, gt_labels = dataset[didx]
+
+        # access meta
+        seg_rec = dataset.get_seg_rec(didx)
+        image_name, scale, seg_bbox, _, view_desc = get_segment_meta(seg_rec)
+        res_name = "{}{}".format(image_name, view_desc)
+
+        # make seg image is large enough for line detector
+        if seg_im.size[0] > 224 and seg_im.size[1] > 224:
+
+            # prepare input
+            inputs = preprocess_line_input(seg_im, 1, shift=0)
+            center_im = re_transform(inputs[4])  # to pil image
+            center_im = np.asarray(center_im)  # to numpy
+
+            try:
+                # apply network
+                output = apply_detector(inputs, model_fcn, device)
+                # visualize_net_output(center_im, output, cunei_id=1, num_classes=2)
+                # plt.show()
+
+                # prepare output
+                outprob = np.mean(output, axis=0)
+                lbl_ind = np.argmax(outprob, axis=0)
+
+                lbl_ind_x = lbl_ind.copy()
+                lbl_ind_x[np.max(outprob, axis=0) < 0.7] = 0  # 7
+
+                lbl_ind_80 = lbl_ind.copy()
+                lbl_ind_80[np.max(outprob, axis=0) < 0.8] = 0    # remove squeeze() from outprob in order to fix a bug!
+
+                # save line detections
+                if save_line_detections:
+                    # line result folder
+                    line_res_path = "{}results/results_line/{}/{}".format(relative_path, line_model_version, saa_version)
+                    make_folder(line_res_path)
+
+                    # save lbl_ind_x
+                    outfile = "{}/{}_lbl_ind.npy".format(line_res_path, res_name)
+                    np.save(outfile, lbl_ind_x.astype(bool))
+
+                    if False:
+                        # compute skeleton
+                        skeleton = skeletonize(lbl_ind_x)
+
+                        # save skeleton
+                        outfile = "{}/{}_skeleton.npy".format(line_res_path, res_name)
+                        np.save(outfile, skeleton.astype(bool))
+            except Exception as e:
+                # Usually CUDA error: out of memory
+                print res_name, e.message, e.args
+
@@ -0,0 +1,184 @@
+import numpy as np
+import pandas as pd
+import matplotlib.pyplot as plt
+from tqdm import tqdm
+
+import torch
+import torch.nn.functional as F
+import torchvision.transforms as transforms
+
+from ..datasets.cunei_dataset_segments import CuneiformSegments, get_segment_meta
+
+from ..alignment.LineFragment import plot_boxes
+from ..utils.path_utils import make_folder
+
+from ..utils.torchcv.box_coder_retina import RetinaBoxCoder
+from ..utils.torchcv.box_coder_fpnssd import FPNSSDBoxCoder
+from ..utils.torchcv.box import box_nms
+from ..utils.torchcv.evaluations.voc_eval import voc_eval
+
+from ..evaluations.sign_evaluation_prep import (prepare_ssd_outputs_for_eval, prepare_ssd_gt_for_eval,
+                                                get_pred_boxes_df, get_gt_boxes_df)
+from ..evaluations.sign_evaluation import eval_detector, eval_detector_on_collection
+from ..evaluations.sign_evaluator import SignEvalBasic, SignEvalFast
+
+
+def gen_ssd_detections(didx_list, dataset, saa_version, relative_path,
+                       model_version, fpnssd_net, with_64, create_bg_class, device,
+                       test_min_score_thresh, test_nms_thresh, eval_ovthresh,
+                       save_detections, show_detections, with_4_aspects=False, verbose_mode=True, return_eval=False):
+
+    list_pred_boxes_df, list_gt_boxes_df = [], []
+    list_seg_ap, list_seg_name_with_anno = [], []
+
+    # setup evaluators
+    use_new_eval = True
+    num_classes = 240
+    # eval_basic = SignEvalBasic(model_version, saa_version, eval_ovthresh)
+    eval_fast = SignEvalFast(model_version, saa_version, tp_thresh=eval_ovthresh, num_classes=num_classes)
+
+    # iterate over segments
+    for didx in tqdm(didx_list, desc=saa_version):
+        # print(didx)
+        seg_im, gt_boxes, gt_labels = dataset[didx]
+
+        # access meta
+        seg_rec = dataset.get_seg_rec(didx)
+        image_name, scale, seg_bbox, _, view_desc = get_segment_meta(seg_rec)
+
+        # for plots
+        input_im = np.asarray(seg_im)
+
+        # prepare box coder
+        # box_coder = RetinaBoxCoder()
+        box_coder = FPNSSDBoxCoder(input_size=seg_im.size, with_64=with_64, with_4_aspects=with_4_aspects, create_bg_class=create_bg_class)
+
+        # prepare input
+        inputs = transforms.Compose([transforms.ToTensor(),
+                                     transforms.Normalize(mean=[0.5], std=[1.0])])(seg_im)
+        inputs = inputs.unsqueeze(0)
+
+        with torch.no_grad():
+            loc_preds, cls_preds = fpnssd_net(inputs.to(device))
+
+            box_preds, label_preds, score_preds = box_coder.decode(
+                loc_preds.cpu().data.squeeze(),
+                F.softmax(cls_preds.squeeze(), dim=1).cpu().data,
+                score_thresh=test_min_score_thresh, nms_thresh=test_nms_thresh)
+
+        if show_detections:
+            # plot prediction
+            plt.figure(figsize=(10, 10))
+            plot_boxes(box_preds, confidence=score_preds)
+            plt.imshow(input_im, cmap='gray')
+            plt.grid(True, color='w', linestyle=':')
+            plt.show()
+
+            # vis_detections(input_im, box_preds, scores=score_preds, labels=label_preds,
+            #                thresh=0.01, max_vis=300, figs_sz=(15, 15))  #lbl2lbl[labels]
+            # plt.show()
+
+        # convert detections to all boxes format
+        all_boxes = prepare_ssd_outputs_for_eval(box_preds, label_preds, score_preds)
+
+        if save_detections:
+            res_name = "{}{}".format(image_name, view_desc)
+            res_path = "{}results/results_ssd/{}/{}".format(relative_path, model_version, saa_version)
+
+            # check folder
+            make_folder(res_path)
+
+            if True:
+                # Save detections
+                # outfile = "{}/{}.npy".format(res_path, res_name)
+                # np.save(outfile, scores)
+
+                # save all_boxes
+                outfile = "{}/{}_all_boxes.npy".format(res_path, res_name)
+                np.save(outfile, all_boxes)
+
+        if gt_boxes is not None:
+
+            if 0:
+                if verbose_mode:
+                    # [METHOD A]: evaluate for a single segment (in tensor format)
+                    print(voc_eval([box_preds.clone()], [label_preds.clone()], [score_preds.clone()],
+                                   [gt_boxes.clone()], [gt_labels.clone()], None,
+                                   iou_thresh=eval_ovthresh, use_07_metric=False)['map'])
+
+            # convert gt to numpy format
+            gt_boxes, gt_labels = prepare_ssd_gt_for_eval(gt_boxes, gt_labels)
+
+            if use_new_eval:
+                list_seg_name_with_anno.append(image_name + view_desc)
+                if verbose_mode:
+                    print(image_name, view_desc)
+                # standard mAP eval
+                # eval_basic.eval_segment(all_boxes, gt_boxes, gt_labels, seg_rec.segm_idx, verbose=verbose_mode)
+                # fast evaluation
+                eval_fast.eval_segment(all_boxes, gt_boxes, gt_labels, seg_rec.segm_idx, verbose=verbose_mode)
+            else:
+                if verbose_mode:
+                    # [METHOD B]: evaluate mAP and print stats for a single segment
+                    # (these results can strongly differ from collection-wise evaluation)
+                    acc, df_stats = eval_detector(gt_boxes, gt_labels, all_boxes, ovthresh=eval_ovthresh)
+                    # collect results
+                    list_seg_ap.append(df_stats['ap'].mean())
+                    list_seg_name_with_anno.append(image_name + view_desc)
+
+                # prepare full collection evaluation
+                list_pred_boxes_df.append(get_pred_boxes_df(all_boxes, seg_rec.segm_idx))
+                list_gt_boxes_df.append(get_gt_boxes_df(gt_boxes, gt_labels, seg_rec.segm_idx))
+
+    # full collection eval
+    if use_new_eval:
+        eval_fast.prepare_eval_collection()
+        df_stats, global_ap = eval_fast.eval_collection(verbose=verbose_mode)
+        if return_eval:
+            return global_ap, df_stats, eval_fast
+        else:
+            if verbose_mode:
+                return eval_fast.list_seg_mean_ap, list_seg_name_with_anno
+            else:
+                return global_ap, df_stats
+    else:
+        acc = 0
+        df_stats = pd.DataFrame()
+        if len(list_gt_boxes_df) > 0:
+            # [METHOD C]: compute mAP across all instances of individual classes
+            # (these results can strongly differ from segment-wise evaluation)
+            gt_boxes_df = pd.concat(list_gt_boxes_df, ignore_index=True)
+            pred_boxes_df = pd.concat(list_pred_boxes_df, ignore_index=True)
+            acc, df_stats = eval_detector_on_collection(gt_boxes_df, pred_boxes_df, ovthresh=eval_ovthresh)
+
+        if verbose_mode:
+            return list_seg_ap, list_seg_name_with_anno
+        else:
+            return acc, df_stats
+
+
+def get_detections(fpnssd_net, device, seg_im, with_64, with_4_aspects, create_bg_class,
+                   test_nms_thresh, test_min_score_thresh):
+    # prepare box coder
+    # box_coder = RetinaBoxCoder()
+    box_coder = FPNSSDBoxCoder(input_size=seg_im.size, with_64=with_64, with_4_aspects=with_4_aspects,
+                               create_bg_class=create_bg_class)
+
+    # prepare input
+    inputs = transforms.Compose([transforms.Lambda(lambda x: x.convert('L')),
+                                 transforms.ToTensor(),
+                                 transforms.Normalize(mean=[0.5], std=[1.0])])(seg_im)
+    inputs = inputs.unsqueeze(0)
+
+    with torch.no_grad():
+        loc_preds, cls_preds = fpnssd_net(inputs.to(device))
+
+        box_preds, label_preds, score_preds = box_coder.decode(
+            loc_preds.cpu().data.squeeze(),
+            F.softmax(cls_preds.squeeze(), dim=1).cpu().data,
+            score_thresh=test_min_score_thresh, nms_thresh=test_nms_thresh)
+
+    # convert detections to all boxes format
+    all_boxes = prepare_ssd_outputs_for_eval(box_preds, label_preds, score_preds)
+
+    return all_boxes
@@ -0,0 +1,194 @@
+import torch
+from torch.autograd import Variable
+import torchvision
+from torchvision.transforms import Resize, FiveCrop, CenterCrop
+
+from ..utils.transform_utils import crop_pil_image
+
+
+def crop_segment_from_tablet_im(pil_im, seg_bbox, context_pad_frac=0):
+    """
+
+    :param pil_im: full tablet image to crop from
+    :param seg_bbox: bbox coordinates [xmin, ymin, xmax, ymax]
+    :param context_pad_frac: the fraction of the minimum side length of bbox to use as padding
+    :return: cropped segment as pil image
+    """
+    min_side = min((seg_bbox[2] - seg_bbox[0], seg_bbox[3]-seg_bbox[1]))
+    context_pad = min_side * context_pad_frac
+    # crop segment
+    segment_crop, new_bbox = crop_pil_image(pil_im, seg_bbox, context_pad=context_pad, pad_to_square=False)
+    return segment_crop, new_bbox
+
+
+def rescale_segment_single(pil_im, scale):
+    """ Produce PIL image of segment at selected scale
+
+    :param pil_im: tablet segment that is to be processed
+    :param scale: scale used for resizing
+    :return: PIL image
+    """
+    # compute scaled size
+    imw, imh = pil_im.size
+    imw = int(imw * scale)
+    imh = int(imh * scale)
+    # compose transforms
+    tablet_transform = torchvision.transforms.Compose([
+            torchvision.transforms.Lambda(lambda x: x.convert('L')),  # convert to gray
+            Resize((imh, imw)),  # resize according to scale
+        ])
+    # apply transforms
+    input_im = tablet_transform(pil_im)
+    return input_im
+
+
+def preprocess_segment_single(pil_im, scale):
+    """ produce tensor of segment at selected scale
+
+    :param pil_im: tablet segment that is to be processed
+    :param scale: scale used for resizing
+    :return: 4D tensors
+    """
+    # compute scaled size
+    imw, imh = pil_im.size
+    imw = int(imw * scale)
+    imh = int(imh * scale)
+    # compose transforms
+    tablet_transform = torchvision.transforms.Compose([
+            torchvision.transforms.Lambda(lambda x: x.convert('L')),  # convert to gray
+            Resize((imh, imw)),  # resize according to scale
+            # tensor-space transforms
+            torchvision.transforms.ToTensor(),
+            torchvision.transforms.Normalize(mean=[0.5], std=[1]),  # normalize
+        ])
+    # apply transforms
+    input_tensor = tablet_transform(pil_im).unsqueeze(0)
+    return input_tensor
+
+
+def preprocess_segment_multi_scale(pil_im, scales):
+    """ produces multiple copies of the segment at different scales
+
+    :param pil_im: tablet segment that is to be processed
+    :param scales: list of scales
+    :return: list of 3D tensors with different shapes (according to scales)
+    """
+    # compute scaled size
+    imw, imh = pil_im.size
+    # tensor-space transforms
+    ts_transform = torchvision.transforms.Compose([
+        torchvision.transforms.ToTensor(),
+        torchvision.transforms.Normalize(mean=[0.5], std=[1]),  # normalize
+    ])
+    # compose transforms
+    tablet_transform = torchvision.transforms.Compose([
+            torchvision.transforms.Lambda(lambda x: x.convert('L')),  # convert to gray
+            # resize according to scales
+            torchvision.transforms.Lambda(
+                lambda crop: [Resize((int(imh * scale), int(imw * scale)))(crop) for scale in scales]),
+            torchvision.transforms.Lambda(
+                lambda scaled_crops: [ts_transform(crop) for crop in scaled_crops]),  # returns a 4D tensor
+        ])
+    # apply transforms
+    im_list = tablet_transform(pil_im)
+    return im_list
+
+
+def preprocess_segment_for_eval(pil_im, scale, shift=0):
+    """ produces five copies of the segment at slightly different offsets
+
+    :param pil_im: tablet segment that is to be processed
+    :param scale: scale which should be used for resizing
+    :param shift: offset shift used to produce five-fold oversampling
+    :return: 4D tensor with 5xCxWxH
+    """
+    # compute scaled size
+    imw, imh = pil_im.size
+    imw = int(imw * scale)
+    imh = int(imh * scale)
+    # determine crop size
+    crop_sz = [int(imh - shift), int(imw - shift)]
+    # tensor-space transforms
+    ts_transform = torchvision.transforms.Compose([
+        torchvision.transforms.ToTensor(),
+        torchvision.transforms.Normalize(mean=[0.5], std=[1]),  # normalize
+    ])
+    # compose transforms
+    tablet_transform = torchvision.transforms.Compose([
+            torchvision.transforms.Lambda(lambda x: x.convert('L')),  # convert to gray
+            Resize((imh, imw)),  # resize according to scale
+            FiveCrop((crop_sz[0], crop_sz[1])),  # oversample
+            torchvision.transforms.Lambda(
+                lambda crops: torch.stack([ts_transform(crop) for crop in crops])),  # returns a 4D tensor
+        ])
+    # apply transforms
+    input_tensor = tablet_transform(pil_im)
+    return input_tensor
+
+
+def predict_im_list(model, im_list, use_gpu, min_sz=227):
+    """ applies model to list of 3D tensors (unlike a 4D tensor in predict())
+
+    :param model: network module that is used for the prediction
+    :param im_list: list of 3D tensors
+    :param use_gpu: boolean that indicates whether GPU is available
+    :param min_sz: minimum side length of input
+    :return: list of result tensors
+    """
+    # apply network model
+    outputs = []
+    for in_im in im_list:
+        if (in_im.shape[1] >= min_sz) and (in_im.shape[2] >= min_sz):
+            # prepare input
+            if use_gpu:
+                in_var = Variable(in_im.cuda(), volatile=True)  # volatile=True -> faster, less memory usage
+            else:
+                in_var = Variable(in_im, volatile=True)
+            output = model(in_var.unsqueeze(0))
+            outputs.append(output.data.cpu().numpy())
+        else:
+            outputs.append(None)
+
+    # convert to numpy
+    return outputs
+
+
+def predict(model, inputs, use_gpu, use_bbox_reg=False):
+    """ applies model to 4D tensor (batch of images)
+
+    :param model: network module that is used for the prediction
+    :param inputs: 4D tensor (batch of images)
+    :param use_gpu: boolean that indicates whether GPU is available
+    :param use_bbox_reg: boolean that indicates whether to use bbox regression
+    :return: result tensor
+    """
+    # prepare input
+    if use_gpu:
+        inputs = Variable(inputs.cuda(), volatile=True)  # volatile=True -> faster, less memory usage
+    else:
+        inputs = Variable(inputs, volatile=True)
+    # apply network model
+    # output = model(inputs) # consumes to much memory
+
+    if use_bbox_reg:
+        scores, bboxes = [], []
+        for in_im in inputs:
+            o1, o2 = model(in_im.unsqueeze(0))
+            scores.append(o1)
+            bboxes.append(o2)
+        # concat and convert to numpy
+        output = torch.cat(scores, dim=0)
+        predicted = output.data.cpu().numpy()
+        output = torch.cat(bboxes, dim=0)
+        predicted_roi = output.data.cpu().numpy()
+    else:
+        scores = []
+        for in_im in inputs:
+            scores.append(model(in_im.unsqueeze(0)))
+        # concat and convert to numpy
+        output = torch.cat(scores, dim=0)
+        predicted = output.data.cpu().numpy()
+        predicted_roi = []
+
+    return predicted, predicted_roi
+
@@ -0,0 +1,121 @@
+# --------------------------------------------------------
+# Adapted from Ross Girshick's Fast/er R-CNN code
+# --------------------------------------------------------
+
+import os.path as osp
+import numpy as np
+# `pip install easydict` if you don't have it
+from easydict import EasyDict as edict
+
+__C = edict()
+# Consumers can get config by:
+#   from fast_rcnn_config import cfg
+cfg = __C
+
+#
+# Detector options [legacy support]
+# These options are only used for the Basic evaluation method, if not specified in the eval scripts directly.
+#
+
+__C.TEST = edict()
+
+# Number classes considered during testing
+__C.TEST.NUM_CLASSES = 240
+
+# Min score for any class
+# (if not any score larger than thresh, suppress box)
+__C.TEST.SCORE_MIN_THRESH = 0.05  # 0.01
+
+# Score threshold for ROI to be considered background
+# (if bg score in (THRESH, 1], suppress box)
+__C.TEST.SCORE_BG_THRESH = 0.7
+
+# Overlap threshold used for non-maximum suppression (suppress boxes with
+# IoU >= this threshold)
+__C.TEST.NMS = 0.3
+
+# Test using bounding-box regressors (only works if a network trained for bbox_reg is evaluated)
+__C.TEST.BBOX_REG = True
+
+# Shift applied to the five different crops during oversampling
+__C.TEST.SHIFT = 24
+
+# Min overlap with ground truth box for positive detection (if IoU < this threshold, detection is a false positive)
+__C.TEST.TP_MIN_OVERLAP = 0.5  # 0.4
+
+
+# Data directory
+__C.DATA_DIR = '/home/tobias/Datasets/cuneiform/'
+
+# tablet directories
+__C.DATA_TEST_DIR = __C.DATA_DIR + 'test_images/'
+
+
+
+# FUNCTIONS for loading cfg
+#
+
+
+def _merge_a_into_b(a, b):
+    """Merge config dictionary a into config dictionary b, clobbering the
+    options in b whenever they are also specified in a.
+    """
+    if type(a) is not edict:
+        return
+
+    for k, v in a.iteritems():
+        # a must specify keys that are in b
+        if k not in b:  # not b.has_key(k):
+            raise KeyError('{} is not a valid config key'.format(k))
+
+        # the types must match, too
+        old_type = type(b[k])
+        if old_type is not type(v):
+            if isinstance(b[k], np.ndarray):
+                v = np.array(v, dtype=b[k].dtype)
+            else:
+                raise ValueError(('Type mismatch ({} vs. {}) '
+                                'for config key: {}').format(type(b[k]),
+                                                            type(v), k))
+
+        # recursively merge dicts
+        if type(v) is edict:
+            try:
+                _merge_a_into_b(a[k], b[k])
+            except:
+                print('Error under config key: {}'.format(k))
+                raise
+        else:
+            b[k] = v
+
+
+def cfg_from_file(filename):
+    """Load a config file and merge it into the default options."""
+    import yaml
+    with open(filename, 'r') as f:
+        yaml_cfg = edict(yaml.load(f))
+
+    _merge_a_into_b(yaml_cfg, __C)
+
+
+def cfg_from_list(cfg_list):
+    """Set config keys via list (e.g., from command line)."""
+    from ast import literal_eval
+    assert len(cfg_list) % 2 == 0
+    for k, v in zip(cfg_list[0::2], cfg_list[1::2]):
+        key_list = k.split('.')
+        d = __C
+        for subkey in key_list[:-1]:
+            assert subkey in d  # d.has_key(subkey)
+            d = d[subkey]
+        subkey = key_list[-1]
+        assert subkey in d  # d.has_key(subkey)
+        try:
+            value = literal_eval(v)
+        except:
+            # handle the case when v is a string literal
+            value = v
+        assert type(value) == type(d[subkey]), \
+            'type {} does not match original type {}'.format(
+            type(value), type(d[subkey]))
+        d[subkey] = value
@@ -0,0 +1,448 @@
+import pandas as pd
+import numpy as np
+
+from scipy.spatial.distance import cdist
+
+import matplotlib.pyplot as plt
+
+from ast import literal_eval
+
+import os.path
+
+from ..alignment.LineFragment import compute_line_endpoints_by_hypo_idx
+from ..detection.detection_helpers import radius_in_image
+from ..detection.line_detection import line_params_from_pts, hess_normal_form_from_pts, dist_lineseg_line
+
+
+class LineAnnotations(object):
+
+    def __init__(self, collection_name, coll_scales=None, interline_dist=128/2., relative_path='../'):
+        # basic paths
+        self.num_classes = 2
+        self.path_to_data_products = '{}data/annotations/'.format(relative_path)
+        self.coll_scales = coll_scales
+        self.interline_dist = interline_dist
+
+        # load collection annotations
+        self.anno_df = self.load_collection_annotations(collection_name)
+
+        if len(self.anno_df) > 0:
+            print('Load line annotations for {} dataset: {} found!'.format(collection_name,
+                                                                           self.anno_df.segm_idx.nunique()))
+        else:
+            print('No line annotations for {} dataset'.format(collection_name))
+
+    def load_collection_annotations(self, collection_name):
+        # assemble annotation file path
+        annotation_file = 'line_annotations_{}.csv'.format(collection_name)
+        annotation_file_path = '{}{}'.format(self.path_to_data_products, annotation_file)
+
+        # check if annotation file exists
+        if os.path.isfile(annotation_file_path):
+            # read annotation file
+            anno_df = pd.read_csv(annotation_file_path, engine='python')
+            # apply scale
+            if self.coll_scales is not None:
+                scale_vec = self.coll_scales[anno_df.segm_idx].values
+                anno_df.x = (anno_df.x * scale_vec).round().astype(int)
+                anno_df.y = (anno_df.y * scale_vec).round().astype(int)
+            # assemble line segs
+            anno_df = anno_df.groupby('segm_idx').apply(assemble_line_segments)
+
+            ## 0) prepare meta data columns
+            # add ls_x_seperate column (depends on assemble_line_segments)
+            anno_df = anno_df.groupby(['segm_idx', 'line_idx']).apply(add_x_minmax)
+            anno_df = anno_df.groupby('segm_idx').apply(mark_x_seperate)
+            # add dist and dist_avg column
+            anno_df['dist'] = anno_df.line_segs.apply(set_line_param)
+            anno_df = anno_df.groupby(['segm_idx', 'line_idx']).apply(set_mean)
+            ##print anno_df
+            # add ls_vert_nb column (depends on assemble_line_segments)
+            #anno_df = anno_df.groupby('segm_idx').apply(mark_vert_nb, self.interline_dist * 0.8)
+
+            ## 1) group lines together
+            # set inline
+            #anno_df['inline'] = [np.intersect1d(*el) for el in anno_df[['ls_vert_nb', 'ls_x_separate']].values]
+            #anno_df['inline'] = [np.empty(0, dtype=int)] * len(anno_df)
+            anno_df['inline'] = pd.Series([np.empty(0, dtype=int)] * len(anno_df), index=anno_df.index)
+
+            # further group line segments by order and ls_x_separate (should be respected when annotating data!)
+            anno_df = anno_df.groupby('segm_idx').apply(group_ls_by_order, self.interline_dist * 5)  # * 3
+
+            # assign actual line idx
+            anno_df = anno_df.groupby('segm_idx').apply(assign_actual_line_index)
+
+            ## 2) refine ordering
+            # reset dist_avg based on gt_line_idx
+            anno_df = anno_df.groupby(['segm_idx', 'gt_line_idx']).apply(set_mean)
+            # assign actual line idx again
+            anno_df = anno_df.groupby('segm_idx').apply(assign_actual_line_index)
+
+            # return data frame
+            return anno_df
+        else:
+            # return empty list (check later with len(.) to see if file exists)
+            return []
+
+    def select_df_by_segm_idx(self, segm_idx):
+        assert len(self.anno_df) > 0, 'No annotations available!'
+        # wrap pandas logic
+        return self.anno_df[(self.anno_df.segm_idx == segm_idx)]
+
+    def visualize_line_annotations(self, segm_idx, input_im, show_line_seg_idx=False):
+        # plot line annotations
+        # get segment data frame
+        seg_line_df = self.select_df_by_segm_idx(segm_idx)
+
+        # check if any anno
+        if len(seg_line_df) > 0:
+            # create basic plot
+            fig, axes = plt.subplots(figsize=(10, 10))
+
+            grouped = seg_line_df.groupby('line_idx')
+
+            color = plt.cm.jet(np.linspace(0, 1, np.max(seg_line_df.line_idx) + 2))
+            for i, line_rec in grouped:
+                gt_line_idx = line_rec.gt_line_idx.values[0]
+                line_idx = line_rec.line_idx.values[0]
+                # print line_rec
+                axes.plot(line_rec.x.values, line_rec.y.values, linewidth=5, color=color[gt_line_idx],)
+                axes.text(line_rec.x.values[0], line_rec.y.values[0], '{}'.format(gt_line_idx),
+                           bbox=dict(facecolor='blue', alpha=0.5), fontsize=8, color='white')
+                if show_line_seg_idx:
+                    axes.text(line_rec.x.values[1], line_rec.y.values[1], '{}'.format(line_idx),
+                               bbox=dict(facecolor='red', alpha=0.5), fontsize=8, color='white')
+                # axes.set_yticks([])
+                # axes.set_xticks([])
+
+            # plot last so that axis get overwritten (no need to remove ticks :)
+            axes.imshow(input_im, cmap='gray')
+            plt.show()
+
+    def get_hypo_line_labeling_for_segm(self, segm_idx, line_hypos_agg, verbose=False):
+
+        # select line segment ground truth
+        seg_ls_df = self.select_df_by_segm_idx(segm_idx).copy()
+        # from n points only n-1 segments -> remove empty ones
+        seg_ls_df = seg_ls_df[seg_ls_df.line_segs.apply(len) > 0]
+
+        # check if any annotations found
+        if len(seg_ls_df) > 0:
+            # assign hypo lines to gt line segments
+            gt_line_segs = seg_ls_df.line_segs.values.tolist()
+            gt_ls_lbl, gt_ls_dist = assign_lines_to_gt_line_segments(gt_line_segs, line_hypos_agg)
+            # update dataframe
+            seg_ls_df['hypo_line_lbl'] = gt_ls_lbl
+            seg_ls_df['hypo_line_dist'] = np.sqrt(gt_ls_dist)
+            # decide hypo line labels
+            seg_ls_df = seg_ls_df.groupby(['gt_line_idx']).apply(decide_hypo_line_lbl)
+        else:
+            if verbose:
+                print('No line ground truth available for segment idx [{}]!'.format(segm_idx))
+
+        return seg_ls_df
+
+    def get_assignment_for_line_hypos(self, segm_idx, line_hypos_agg):
+        # create empty dummy for cases where no annotations available
+        gt_line_assignment = pd.DataFrame()
+
+        if len(self.anno_df) > 0:
+            # get labelling
+            seg_ls_df = self.get_hypo_line_labeling_for_segm(segm_idx, line_hypos_agg)
+
+            if len(seg_ls_df) > 0:
+                # in case of multiple annotations per hypo line, pick the one with smallest distance
+                gt_line_assignment = seg_ls_df.sort_values('hypo_line_dist').groupby('hypo_line_lbl').head(1)[
+                    ['gt_line_idx', 'hypo_line_lbl']]
+                gt_line_assignment = gt_line_assignment.sort_values('gt_line_idx')
+
+        return gt_line_assignment
+
+    def visualize_hypo_line_assignments(self, segm_idx, line_hypos_agg, input_im):
+        # get labelling
+        seg_ls_df = self.get_hypo_line_labeling_for_segm(segm_idx, line_hypos_agg)
+        gt_ls_lbl = seg_ls_df.hypo_line_lbl.values
+        gt_line_segs = seg_ls_df.line_segs.values
+
+        # visualize
+        visualize_line_segments_with_labels(gt_line_segs, gt_ls_lbl, input_im)
+
+    def visualize_gt_lines_with_assignments(self, segm_idx, line_hypos_agg, center_im):
+
+        # gt assignment
+        gt_line_assignment = self.get_assignment_for_line_hypos(segm_idx, line_hypos_agg)
+
+        # get labelling
+        seg_ls_df = self.get_hypo_line_labeling_for_segm(segm_idx, line_hypos_agg)
+        gt_ls_lbl = seg_ls_df.gt_line_idx.values
+        gt_line_segs = seg_ls_df.line_segs.values
+
+        # get line hypo endpoints
+        list_hypo_endpts = [np.fliplr(np.array(compute_line_endpoints_by_hypo_idx(hidx, line_hypos_agg)).
+                                      reshape(2, 2)).ravel() for hidx in gt_line_assignment.hypo_line_lbl.values]
+
+        # get color map
+        color = plt.cm.spectral(np.linspace(0, 1, np.max(gt_ls_lbl) + 1))  # len(np.unique(gt_ls_lbl))
+
+        fig, axes = plt.subplots(1, 2, figsize=(15, 7))
+        ax = axes.ravel()
+
+        ax[0].imshow(center_im, cmap='gray')
+        ax[0].set_title('Input image')
+
+        ax[1].imshow(center_im * 0)
+        for line, li in zip(gt_line_segs, gt_ls_lbl):
+            p0, p1 = line
+            ax[1].plot((p0[0], p1[0]), (p0[1], p1[1]), color=color[li], linewidth=2)
+
+        ax[1].set_xlim((0, center_im.shape[1]))
+        ax[1].set_ylim((center_im.shape[0], 0))
+        ax[1].set_title('gt line segments and assigned line hypos')
+
+        for idx, line_pts in enumerate(list_hypo_endpts):
+            ax[1].plot(line_pts[::2], line_pts[1::2], '-', color=color[int(idx)], linewidth=2)
+            ax[1].text(line_pts[0], line_pts[1], '{}'.format(idx),
+                       bbox=dict(facecolor='blue', alpha=0.5), fontsize=8, color='white')
+
+
+
+#### HELPERS
+
+# create line segment column
+
+def assemble_line_segments(group):
+    # assemble line segments
+    line_grouped = group.groupby('line_idx')
+    line_segs = []
+    # iterate over lines
+    for lidx, lgroup in line_grouped:
+        num_pts = len(lgroup)
+        # iterate over segments
+        for sidx in range(num_pts):
+            # assemble segments
+            if sidx == num_pts - 1:
+                line_segs.append(())
+            else:
+                line_segs.append(((lgroup.iloc[sidx].x, lgroup.iloc[sidx].y),
+                                  (lgroup.iloc[sidx + 1].x, lgroup.iloc[sidx + 1].y)
+                                  ))
+    # assign to group
+    group['line_segs'] = line_segs
+
+    return group
+
+
+# group line segments to line
+
+def add_x_minmax(group):
+    group['xmin'] = group.x.min()
+    group['xmax'] = group.x.max()
+    return group
+
+
+def mark_x_seperate(group):
+    # iterate line segments
+    list_left_or_right = []
+    for i, (ls_idx, line_seg) in enumerate(group.iterrows()):
+        # create list of segments to the left
+        index_left = group.line_idx[group.xmax < line_seg.xmin].unique()
+        # create list of segments to the left
+        index_right = group.line_idx[group.xmin > line_seg.xmax].unique()
+        # concat and append to list
+        list_left_or_right.append(np.concatenate([np.array(index_left), np.array(index_right)]))
+    group['ls_x_separate'] = list_left_or_right
+    return group
+
+
+def set_line_param(line_seg):
+
+    if len(line_seg) > 0:
+        # use basic line equation
+        #line_params = line_params_from_pts(line_seg[0], line_seg[1])
+
+        # use hess normal form (in corporates angle)
+        line_params = hess_normal_form_from_pts(line_seg[0], line_seg[1])
+        return line_params[1]  # only interest in height
+    else:
+        return np.NaN
+
+
+def set_mean(group):
+    group['dist_avg'] = group.dist.mean()
+    return group
+
+
+def mark_vert_nb(group, interline_thresh):
+    # iterate line segments
+    list_vert_nb = []
+    for i, (ls_idx, line_seg) in enumerate(group.iterrows()):
+        # create list of segments to the left
+        index_vert_near = group.line_idx[(group.dist >= 0) &
+                                         (np.abs(group.dist_avg - line_seg.dist_avg) < interline_thresh)].unique()
+        list_vert_nb.append(np.array(index_vert_near))
+
+    group['ls_vert_nb'] = list_vert_nb
+    return group
+
+
+# def make_inline_symmetric(group):
+#     # iterate over line segments, and make symmetric reference of inline
+#     for i, (sidx, line_seg) in enumerate(group.iterrows()):
+#         if len(line_seg.inline) > 0:
+#             select_inline = group.line_idx.isin(line_seg.inline)
+#             group.loc[select_inline, 'inline'] = select_inline.sum() * [line_seg.inline]
+#     # deal with type mismatch in column (did find no better way :/)
+#     inline_list = []
+#     for el in group.inline.astype(list).values:
+#         if isinstance(el, np.ndarray):
+#             inline_list.append(el)
+#         else:
+#             inline_list.append(np.array([el]))
+#     group['inline'] = inline_list
+#     # return
+#     return group
+
+
+def group_ls_by_order(group, interline_thresh):
+    last_lidx = -1
+    last_xseparate = []
+    # QUICK FIX: use this to deal with loc and list inserts (loc[idx] works rather than loc[idx, col]!!)
+    group_inline = group.inline
+    # iter line_idx aggregate
+    # https://stackoverflow.com/questions/20067636/pandas-dataframe-get-first-row-of-each-group/49148885#49148885
+    ls_agg = group.sort_values('line_idx').groupby('line_idx').nth(0)  #.first() is dangerous
+
+    for curr_lidx, ls_agg_rec in ls_agg.iterrows():
+        if last_lidx != -1:
+            # check if last line segment is x separate
+            if np.any(np.isin(ls_agg_rec.ls_x_separate, last_lidx)):
+                last_rec = ls_agg.loc[last_lidx]
+                # check if last line segment on the left
+                ls_left = (last_rec.xmax < ls_agg_rec.xmin)
+                if ls_left:
+                    # check if vertical distance is small
+                    vert_dist_is_small = np.abs(last_rec.dist_avg - ls_agg_rec.dist_avg) < interline_thresh
+                    if vert_dist_is_small:
+                        # check if already inline
+                        if last_lidx not in ls_agg_rec.inline:
+                            # print('merge line segments {} with {}'.format(curr_lidx, last_lidx))
+                            # create new inlines
+                            # do not use ls_agg_rec.inline, since it does not get updated during loop
+                            #new_inline = np.concatenate([ls_agg_rec.inline, np.array([last_lidx])])
+                            #new_last_inline = np.concatenate([ls_agg_rec.inline, np.array([curr_lidx])])
+                            new_inline = np.concatenate([group_inline.loc[group.line_idx == curr_lidx].values[0], np.array([last_lidx])])
+                            new_last_inline = np.concatenate([group_inline.loc[group.line_idx == last_lidx].values[0], np.array([curr_lidx])])
+                            # add to data frame (loc[idx] works rather than loc[idx, col]!!)
+                            select_line_idx = (group.line_idx == curr_lidx)
+                            group_inline.loc[select_line_idx] = [new_inline] * select_line_idx.sum()
+                            select_line_idx = (group.line_idx == last_lidx)
+                            group_inline.loc[select_line_idx] = [new_last_inline] * select_line_idx.sum()
+
+        # set last values
+        last_lidx = curr_lidx
+        last_xseparate = ls_agg_rec.ls_x_separate
+    return group
+
+
+# finalize assignment
+
+def assign_actual_line_index(group):
+    # create new column
+    group['gt_line_idx'] = np.ones(len(group), dtype=int) * -1
+    # iterate over line segments and assign acutal_line_idx (segs sorted by 1) y position 2) x position)
+    new_idx = 0
+    for sidx, line_seg in group.sort_values(['dist_avg', 'x']).iterrows():
+        # check if index is already set
+        if group.loc[sidx, 'gt_line_idx'] == -1:
+            # assign index to line segment
+            group.loc[group.line_idx == line_seg.line_idx, 'gt_line_idx'] = new_idx
+            # assign same index to inline segments
+            for lidx in line_seg.inline:
+                group.loc[group.line_idx == lidx, 'gt_line_idx'] = new_idx
+            # finally increment index
+            new_idx += 1
+        # if index is already set, extend it to all inline members
+        else:
+            curr_idx = group.loc[sidx, 'gt_line_idx']
+            # assign same index to inline segments
+            for lidx in line_seg.inline:
+                group.loc[group.line_idx == lidx, 'gt_line_idx'] = curr_idx
+
+    return group
+
+
+# for eval need to assign detection lines to ground truth lines
+
+def assign_lines_to_gt_line_segments(gt_line_segs, line_hypos_agg):
+    # get line pts from polar lines
+    line_pts = []
+    for idx in range(len(line_hypos_agg)):
+        # compute line endpoints
+        line_pts.append(compute_line_endpoints_by_hypo_idx(idx, line_hypos_agg))
+        #line_pts.append(line_frag.compute_line_endpoints(-1, hypo_idx=i))
+
+    line_pts = np.vstack(line_pts)
+    line_pts = np.flip(line_pts.reshape((-1, 2, 2)), axis=2).reshape(-1, 4)
+
+    # get line segments
+    line_seg_pts = np.stack(gt_line_segs).reshape(len(gt_line_segs), -1)
+
+    # compute distance between line segments and lines
+    X2_dist = cdist(line_pts, line_seg_pts,
+                    lambda lpts, spts: dist_lineseg_line(spts[:2], spts[2:], lpts[:2], lpts[2:]))
+    # assign line segments to nearest line
+    ls_labels = np.argmin(X2_dist, axis=0)
+    ls_dist = np.min(X2_dist, axis=0)
+
+    return ls_labels, ls_dist
+
+
+def decide_hypo_line_lbl(group):
+    # count hypo line labels
+    uv, counts = np.unique(group.hypo_line_lbl, return_counts=True)
+    # get idx to all largest
+    largest_select = (np.max(counts) == counts)
+    # check if tiebreak is required
+    if largest_select.sum() > 1:
+        # for each similar large group compute mean hypo_line_dist and pick largest
+        tiebreak_df = group.groupby('hypo_line_lbl').hypo_line_dist.mean()
+        most_freq_hypo_lbl = tiebreak_df[uv[largest_select]].idxmax()
+    else:
+        most_freq_hypo_lbl = uv[np.argmax(counts)]
+
+    # assign most frequent label
+    group['hypo_line_lbl'] = most_freq_hypo_lbl
+
+    return group
+
+
+# visualize
+
+def visualize_line_segments_with_labels(gt_line_segs, gt_ls_lbl, center_im, line_hypo_endpts=None):
+
+    color = plt.cm.spectral(np.linspace(0, 1, np.max(gt_ls_lbl) + 1))
+
+    fig, axes = plt.subplots(1, 2, figsize=(15, 7))
+    ax = axes.ravel()
+
+    ax[0].imshow(center_im, cmap='gray')
+    ax[0].set_title('Input image')
+
+    # ax[1].imshow(lbl_ind_x, cmap='gray')
+    # ax[1].set_title('line det')
+
+    ax[1].imshow(center_im * 0)
+    for line, li in zip(gt_line_segs, gt_ls_lbl):
+        p0, p1 = line
+        ax[1].plot((p0[0], p1[0]), (p0[1], p1[1]), color=color[li], linewidth=2)
+        ax[1].text(p0[0], p0[1], '{}'.format(li),
+                   bbox=dict(facecolor='blue', alpha=0.5), fontsize=8, color='white')
+    ax[1].set_xlim((0, center_im.shape[1]))
+    ax[1].set_ylim((center_im.shape[0], 0))
+    ax[1].set_title('gt line segments and assigned line hypos')
+
+    if line_hypo_endpts is not None:
+        for idx, line_pts in enumerate(line_hypo_endpts):
+            ax[1].plot(line_pts[::2], line_pts[1::2], '-', color=color[int(idx)], linewidth=2)
+
+
@@ -0,0 +1,15 @@
+# evaluate line-tl alignment using gt-line annotations
+# only quality indicator because transliterations are unreliable
+# (transliterations often miss lines or contain invisible lines)
+
+
+def eval_line_tl_alignment(line_frag, lines_anno, seg_idx, num_vis_lines):
+    tl_asst_eval = line_frag.line_hypos_agg[['gt_line_idx', 'tl_line']].dropna()
+    tl_asst_eval = tl_asst_eval[tl_asst_eval.tl_line >= 0]  # do not count unassigned lines
+    print('LineHypos-TL assignment accuracy: {}'.format(
+        (tl_asst_eval.gt_line_idx == tl_asst_eval.tl_line).mean()))
+    # check if consistent gt and tl
+    num_lines_tl = num_vis_lines  # line_frag.tl_df.line_idx.nunique()
+    num_lines_gt = lines_anno.select_df_by_segm_idx(seg_idx).gt_line_idx.nunique()
+    if num_lines_tl != num_lines_gt:
+        print('line annotation - transliteration mismatch: {} vs {} lines  '.format(num_lines_gt, num_lines_tl))
@@ -0,0 +1,512 @@
+import numpy as np
+import pandas as pd
+
+from tqdm import tqdm
+
+from .config import cfg
+
+from ..detection.detection_helpers import convert_detections_to_array
+from ..utils.bbox_utils import box_iou
+
+
+def voc_ap(rec, prec, use_07_metric=False):
+    """ ap = voc_ap(rec, prec, [use_07_metric])
+    Compute VOC AP given precision and recall.
+    If use_07_metric is true, uses the
+    VOC 07 11 point method (default:False).
+
+    Reference: Ross Girshick's Fast/er R-CNN code
+    """
+    if use_07_metric:
+        # 11 point metric
+        ap = 0.
+        for t in np.arange(0., 1.1, 0.1):
+            if np.sum(rec >= t) == 0:
+                p = 0
+            else:
+                p = np.max(prec[rec >= t])
+            ap = ap + p / 11.
+    else:
+        # correct AP calculation
+        # first append sentinel values at the end
+        mrec = np.concatenate(([0.], rec, [1.]))
+        mpre = np.concatenate(([0.], prec, [0.]))
+
+        # compute the precision envelope
+        for i in range(mpre.size - 1, 0, -1):
+            mpre[i - 1] = np.maximum(mpre[i - 1], mpre[i])
+
+        # to calculate area under PR curve, look for points
+        # where X axis (recall) changes value
+        i = np.where(mrec[1:] != mrec[:-1])[0]
+
+        # and sum (\Delta recall) * prec
+        ap = np.sum((mrec[i + 1] - mrec[i]) * mpre[i + 1])
+    return ap
+
+
+# *BASIC* AP COMPUTATION (Fast RCNN style)
+
+def evaluate_on_gt(gt_boxes, gt_labels, num_images, all_boxes, ovthresh=None, num_classes=None, use_07_metric=False):
+    # Reference: Ross Girshick's Fast/er R-CNN code
+
+    if ovthresh is None:
+        ovthresh = cfg.TEST.TP_MIN_OVERLAP
+    if num_classes is None:
+        num_classes = cfg.TEST.NUM_CLASSES
+
+    # all detections are collected into:
+    #    all_boxes[cls][image] = N x 5 array of detections in
+    #    (x1, y1, x2, y2, score)
+    all_tp = [[[] for _ in xrange(num_images)]
+                for _ in xrange(num_classes)]
+    all_fp = [[[] for _ in xrange(num_images)]
+                for _ in xrange(num_classes)]
+    det_stats = []
+    total_num_tp = 0
+    total_false_cls = np.zeros(num_classes)
+    for j in xrange(1, num_classes):  # num_classes
+        # if no detections for class available
+        if len(all_boxes[j][0]) == 0:
+            BB = np.empty((0, 4), dtype=np.float32)
+            confidence = np.empty(0, dtype=np.float32)
+        else:
+            BB = all_boxes[j][0][:, :4]
+            confidence = all_boxes[j][0][:, -1]
+
+        # sort by confidence
+        sorted_ind = np.argsort(-confidence)
+        sorted_scores = np.sort(-confidence)
+        BB = BB[sorted_ind, :]
+        inds = np.where(gt_labels == j)[0]
+        BBGT = gt_boxes[inds, :].astype(float)
+        npos = BBGT.shape[0]
+        det = [False] * npos
+
+        if npos > 0:  # else if no gt boxes available for class, AP computation is not meaningful
+
+            # go down dets and mark TPs and FPs
+            nd = len(sorted_ind)
+            tp = np.zeros(nd)
+            fp = np.zeros(nd)
+            cls_tp = []
+            cls_fp = []
+            for d in range(nd):
+                bb = BB[d, :].astype(float)
+                ovmax = -np.inf
+
+                if BBGT.size > 0:
+                    # compute overlaps
+                    # intersection
+                    ixmin = np.maximum(BBGT[:, 0], bb[0])
+                    iymin = np.maximum(BBGT[:, 1], bb[1])
+                    ixmax = np.minimum(BBGT[:, 2], bb[2])
+                    iymax = np.minimum(BBGT[:, 3], bb[3])
+                    iw = np.maximum(ixmax - ixmin + 1., 0.)
+                    ih = np.maximum(iymax - iymin + 1., 0.)
+                    inters = iw * ih
+
+                    # union
+                    uni = ((bb[2] - bb[0] + 1.) * (bb[3] - bb[1] + 1.) +
+                           (BBGT[:, 2] - BBGT[:, 0] + 1.) *
+                           (BBGT[:, 3] - BBGT[:, 1] + 1.) - inters)
+
+                    overlaps = inters / uni
+                    ovmax = np.max(overlaps)
+                    jmax = np.argmax(overlaps)
+
+                if ovmax > ovthresh:
+                    if not det[jmax]:
+                        tp[d] = 1.
+                        det[jmax] = 1
+                        cls_tp.append(d)
+                    else:
+                        # double detection (unlikely due to nms)
+                        fp[d] = 1.
+                        cls_fp.append(d)  # comment?!
+                else:
+                    fp[d] = 1.
+                    cls_fp.append(d)
+
+            # save tp detections
+            all_tp[j][0] = np.array(cls_tp)
+            # save fp detections
+            all_fp[j][0] = np.array(cls_fp)
+            # compute precision recall
+            fp = np.cumsum(fp)
+            tp = np.cumsum(tp)
+            rec = tp / float(npos)
+            # avoid divide by zero in case the first detection matches a difficult
+            # ground truth
+            prec = tp / np.maximum(tp + fp, np.finfo(np.float64).eps)
+            ap = voc_ap(rec, prec, use_07_metric)
+            # print rec, prec, ap
+            num_tp = np.sum(det).astype(int)
+            total_num_tp += num_tp
+            det_stats.append([npos, nd, num_tp, nd-num_tp, ap, j])
+        else:
+            if len(BB) > 0:
+                total_false_cls[j] += len(BB)
+                #print 'outlier class:', j, len(BB)
+    select_nonzero = total_false_cls > 0
+    # print(np.nonzero(select_nonzero), total_false_cls[select_nonzero])
+    return all_tp, all_fp, det_stats, total_num_tp  #, total_false_cls
+
+
+def df_evaluate_on_gt(gt_boxes_df, pred_boxes_df, ovthresh=None, num_classes=None, use_07_metric=False):
+    # Reference: Ross Girshick's Fast/er R-CNN code
+
+    if ovthresh is None:
+        ovthresh = cfg.TEST.TP_MIN_OVERLAP
+    if num_classes is None:
+        num_classes = cfg.TEST.NUM_CLASSES
+    num_images = gt_boxes_df.seg_idx.nunique()
+
+    # sort by confidence
+    pred_boxes_df = pred_boxes_df.sort_values('conf', ascending=False)
+
+    det = [False] * len(gt_boxes_df)
+
+    det_stats = []
+    total_num_tp = 0
+    for j in tqdm(xrange(1, num_classes)):  # num_classes
+        cls_dets_df = pred_boxes_df[pred_boxes_df.cls == j]
+        cls_gt_df = gt_boxes_df[gt_boxes_df.cls == j]
+
+        # get bounding box and image ids
+        BB = cls_dets_df[['x1', 'y1', 'x2', 'y2']].values
+        image_ids = cls_dets_df.seg_idx.values
+        # confidence = cls_dets_df.conf.values
+
+        npos = len(cls_gt_df)
+
+        if npos > 0:  # else if no gt boxes available for class, AP computation is not meaningful
+
+            # go down dets and mark TPs and FPs
+            nd = len(cls_dets_df)
+            tp = np.zeros(nd)
+            fp = np.zeros(nd)
+
+            for d in range(nd):
+                ovmax = -np.inf
+
+                # get bbox and seg_idx
+                bb = BB[d, :].astype(float)
+                seg_idx = image_ids[d]
+
+                # get gt boxes
+                seg_cls_gt_df = cls_gt_df[cls_gt_df.seg_idx == seg_idx]
+                BBGT = seg_cls_gt_df[['x1', 'y1', 'x2', 'y2']].values.astype(float)
+
+                if BBGT.size > 0:
+                    # compute overlaps
+                    # intersection
+                    ixmin = np.maximum(BBGT[:, 0], bb[0])
+                    iymin = np.maximum(BBGT[:, 1], bb[1])
+                    ixmax = np.minimum(BBGT[:, 2], bb[2])
+                    iymax = np.minimum(BBGT[:, 3], bb[3])
+                    iw = np.maximum(ixmax - ixmin + 1., 0.)
+                    ih = np.maximum(iymax - iymin + 1., 0.)
+                    inters = iw * ih
+
+                    # union
+                    uni = ((bb[2] - bb[0] + 1.) * (bb[3] - bb[1] + 1.) +
+                           (BBGT[:, 2] - BBGT[:, 0] + 1.) *
+                           (BBGT[:, 3] - BBGT[:, 1] + 1.) - inters)
+
+                    overlaps = inters / uni
+                    ovmax = np.max(overlaps)
+                    jmax = np.argmax(overlaps)
+
+                if ovmax > ovthresh:
+                    # map seg_cls idx to global idx
+                    gidx = seg_cls_gt_df.index.values[jmax]
+                    if not det[gidx]:
+                        tp[d] = 1.
+                        det[gidx] = 1
+                    else:
+                        # double detection (unlikely due to nms)
+                        fp[d] = 1.
+                else:
+                    fp[d] = 1.
+            # compute num tp before cumsum (!)
+            num_tp = np.sum(tp).astype(int)
+            # compute precision recall
+            fp = np.cumsum(fp)
+            tp = np.cumsum(tp)
+            rec = tp / float(npos)
+            # avoid divide by zero in case the first detection matches a difficult
+            # ground truth
+            prec = tp / np.maximum(tp + fp, np.finfo(np.float64).eps)
+            ap = voc_ap(rec, prec, use_07_metric)
+            # print rec, prec, ap
+            total_num_tp += num_tp
+            det_stats.append([npos, nd, num_tp, nd-num_tp, ap, j])
+            # print np.sum(det), total_num_tp
+        else:
+            if len(cls_dets_df) > 0:
+                if False:  # turn on for debugging to see which classes are missing
+                    print('outlier class:', j, len(BB))
+
+    return det_stats, total_num_tp
+
+
+def eval_detector(gt_boxes, gt_labels, all_boxes, ovthresh=None, verbose=True):
+    # evaluate
+    num_imgs = 1
+    all_tp, all_fp, det_stats, total_num_tp = evaluate_on_gt(gt_boxes, gt_labels, num_imgs, all_boxes,
+                                                             ovthresh=ovthresh)
+
+    total_num_fp = int(np.sum(np.array(det_stats)[:, 3]))
+    # print stats
+    pd.set_option('display.max_rows', 50)
+    df_stats = pd.DataFrame(det_stats, columns=['num_gt', 'num_det', 'tp', 'fp', 'ap', 'lbl'])
+
+    if verbose:
+        print("total_tp", total_num_tp, "total_fp", total_num_fp,
+              "mAP", '{:0.4f}'.format(df_stats['ap'].mean()),
+              "mAP(nonzero)", '{:0.4f}'.format(df_stats['ap'].iloc[df_stats['ap'].nonzero()[0]].mean()))
+    acc = total_num_tp / float(total_num_tp + total_num_fp)
+
+    return acc, df_stats
+
+
+def eval_detector_on_collection(gt_boxes_df, pred_boxes_df, ovthresh=None):
+    det_stats, total_num_tp = df_evaluate_on_gt(gt_boxes_df, pred_boxes_df, ovthresh=ovthresh)
+
+    total_num_fp = int(np.sum(np.array(det_stats)[:, 3]))
+    # print stats
+    pd.set_option('display.max_rows', 50)
+    df_stats = pd.DataFrame(det_stats, columns=['num_gt', 'num_det', 'tp', 'fp', 'ap', 'lbl'])
+
+    print('RESULTS ON FULL COLLECTION :')
+    print("total_tp", total_num_tp, "total_fp", total_num_fp,
+          "acc", '{:0.3f}'.format(total_num_tp / float(total_num_tp + total_num_fp)),
+          "mAP", '{:0.4f}'.format(df_stats['ap'].mean()),
+          "mAP(nonzero)", '{:0.4f}'.format(df_stats['ap'].iloc[df_stats['ap'].nonzero()[0]].mean()))
+    acc = total_num_tp / float(total_num_tp + total_num_fp)
+
+    return acc, df_stats
+
+
+# *FAST* AP COMPUTATION
+
+
+# prepare AP computation
+
+
+def add_max_det(group):
+    # add column to dataframe
+    group['max_det'] = False
+    # select detections marked as TP
+    tp_group = group[group.det_type == 3]
+    # only one can be TP, others are double detections
+    if len(tp_group) > 0:
+        # set max entry to true
+        group.max_det.loc[tp_group.score.idxmax()] = True
+    return group
+
+
+def add_det_type_column(eval_df, tp_thresh=0.5, bg_thresh=0.2):
+    # based on "Diagnosing Error in Object Detectors" by Hoiem et al.
+    # modifications:
+    # sim and other categories are merged, since every sign is considered similar
+    # bg_thresh is 0.2 instead of default 0.1
+
+    # determine detection types
+
+    type_list = []
+    for didx, det_rec in eval_df.iterrows():
+        overlap = det_rec.overlap
+        # class matches
+        if det_rec.pred == det_rec.true:
+            if overlap > tp_thresh:
+                type_list.append(3)  # TP (3)
+            elif overlap > bg_thresh:
+                type_list.append(0)  # FP: Loc(0) confusion
+            else:
+                type_list.append(2)  # FP: BG(2) confusion
+        else:
+            if overlap > bg_thresh:
+                type_list.append(1)  # FP: Sim/Oth(1) confusion
+            else:
+                type_list.append(2)  # FP: BG(2) confusion
+
+    # add column to dataframe
+    eval_df['det_type'] = type_list
+
+    return eval_df
+
+
+def prepare_eval_df(all_boxes, gt_boxes, gt_labels, seg_idx, tp_thresh, bg_thresh):
+    """ prepare eval_df that contains most information for average precision computation """
+    # convert all_boxes to ndarray (N x 9)
+    # [ID, cx, cy, score, x1, y1, x2, y2, idx]  bbox = [4:8]  ctr  = [1:3]
+    sign_detections = convert_detections_to_array(all_boxes)
+
+    # compute ious between detections and gt_boxes
+    ious = box_iou(sign_detections[:, 4:8], gt_boxes)
+
+    # for each detection get best fit with gt box
+    index_gt = np.argmax(ious, axis=1)
+    overlap_gt = np.max(ious, axis=1)
+    label_gt = gt_labels[index_gt]
+
+    # collect in data frame
+    eval_df = pd.DataFrame(np.hstack([overlap_gt.reshape(-1, 1), label_gt.reshape(-1, 1),
+                                      sign_detections[:, [0, 3, 8]], index_gt.reshape(-1, 1)]),
+                           columns=['overlap', 'true', 'pred', 'score', 'det_idx', 'gt_idx'])
+    # add column with segment index
+    eval_df['seg_idx'] = seg_idx
+    # add det_type column (0:LOC, 1:SIM, 2:BG, 3:TP)
+    eval_df = add_det_type_column(eval_df, tp_thresh, bg_thresh)
+    # compute max_det (in order to fin double detections)
+    eval_df = eval_df.groupby('gt_idx').apply(add_max_det)
+
+    return eval_df
+
+
+# AP computation
+
+
+def compute_mean_ap(col_eval_df, gt_df, num_classes=240, class_list=None, verbose=True):
+    """ compute mean class AP """
+
+    # define list of classes to evaluate over
+    if class_list is None:
+        class_list = np.arange(1, num_classes)  # range(1, num_classes)
+    col_eval_df = col_eval_df.sort_values('score', ascending=False)
+    if False:
+        # filter gt according to considered segments
+        bbox_anno = None
+        gt_df = bbox_anno.anno_df[bbox_anno.anno_df.segm_idx.isin(col_eval_df.seg_idx.unique())]
+        gt_df['cls'] = gt_df.train_label
+
+    # compute class counts
+    gt_counts = gt_df.cls.value_counts()
+
+    det_stats = []
+    for cls_idx in class_list:
+        # get class predictions
+        cls_det_df = col_eval_df[col_eval_df.pred == cls_idx]
+        # get gt number
+        if cls_idx in gt_counts.index:
+            npos = gt_counts[cls_idx]
+        else:
+            npos = 0
+        if npos > 0:
+            if 1:
+                tp_vec = (cls_det_df.det_type == 3) & (cls_det_df.max_det == True)
+                fp_vec = ~tp_vec
+                # fp_vec = (cls_det_df.det_type < 3) | (cls_det_df.max_det == False)
+                fp = np.cumsum(fp_vec.values)
+                tp = np.cumsum(tp_vec.values)
+
+                assert np.all(tp_vec != fp_vec), np.intersect1d(tp_vec, fp_vec)
+            else:
+                # without considering double detections
+                fp = np.cumsum(cls_det_df.det_type < 3)
+                tp = np.cumsum(cls_det_df.det_type == 3)
+
+            rec = tp / float(npos)
+            prec = tp / np.maximum(tp + fp, np.finfo(np.float64).eps)
+            ap = voc_ap(rec, prec, False)
+            # sum is used to map empty list to 0
+            det_stats.append([npos, len(cls_det_df), np.sum(tp[-1:]), np.sum(fp[-1:]), ap, cls_idx])
+        else:
+            if len(cls_det_df) > 0:
+                if False:  # turn on for debugging to see which classes are missing
+                    print('outlier class:', cls_idx, len(cls_det_df))
+    # convert to ndarray
+    det_stats = np.asarray(det_stats)
+    mean_ap = np.mean(det_stats[:, -2])
+    # return aps
+    if verbose:
+        print('mAP {:.4}'.format(mean_ap))
+    return det_stats
+
+
+def compute_global_ap(col_eval_df, gt_df, num_classes=240, verbose=True):
+    """ compute global AP """
+
+    # sort according to score
+    col_eval_df = col_eval_df.sort_values('score', ascending=False)
+    # not necessary, because predict classes are only in range [1, num_classes] anyways
+    cls_det_df = col_eval_df[col_eval_df.pred.isin(range(1, num_classes))]
+    if False:
+        # filter gt according to considered segments
+        bbox_anno = None
+        gt_df = bbox_anno.anno_df[bbox_anno.anno_df.segm_idx.isin(col_eval_df.seg_idx.unique())]
+        gt_df['cls'] = gt_df.train_label
+    # filter considered classes
+    gt_df = gt_df[gt_df.cls.isin(range(1, num_classes))]
+
+    # select number of gt positives
+    npos = len(gt_df)
+    # npos = len(bbox_anno.anno_df.train_label[bbox_anno.anno_df.train_label > 0])
+
+    ap = 0
+    if npos > 0:
+        if 1:
+            tp_vec = (cls_det_df.det_type == 3) & (cls_det_df.max_det == True)
+            fp_vec = ~tp_vec
+            fp = np.cumsum(fp_vec)
+            tp = np.cumsum(tp_vec)
+
+            assert np.all(tp_vec != fp_vec), np.intersect1d(tp_vec, fp_vec)
+        else:
+            # without considering double detections
+            fp = np.cumsum(cls_det_df.det_type < 3)
+            tp = np.cumsum(cls_det_df.det_type == 3)
+
+        rec = tp / float(npos)
+        prec = tp / np.maximum(tp + fp, np.finfo(np.float64).eps)
+        ap = voc_ap(rec, prec, False)
+
+        if False:
+            from sklearn.metrics import precision_recall_curve, auc
+            import matplotlib.pyplot as plt
+
+            # compute normalized PR curve
+            precision, recall, _ = precision_recall_curve(tp_vec, cls_det_df.score.values)
+            # plot pr curve
+            plt.figure()
+            plt.step(recall, precision, color='b', alpha=0.2, where='post')
+            # plt.step(rec, prec, color='b', alpha=0.2)  # works, but rec values not normalized to [0, 1] range
+
+            # compare different ways to compute VOC AP (ie. area under the precision recall curve)
+            # first two methods should produce same results, but there are slight differences
+            # in doubt use original VOC AP code
+            # https://datascience.stackexchange.com/questions/25119/how-to-calculate-map-for-detection-task-for-the-pascal-voc-challenge
+            # https://github.com/rafaelpadilla/Object-Detection-Metrics
+            plt.title('voc ap: {:.3} | PR AUC: {:.3} | norm. PR AUC: {:.3}'.format(voc_ap(rec, prec, False),
+                                                                                   auc(rec, prec),
+                                                                                   auc(recall, precision)))
+            plt.show()
+
+    # return ap
+    if verbose:
+        print('global AP {:.4}'.format(ap))
+    return ap
+
+
+# FP categorization
+
+
+def get_type_val_frac(fp_type_series, type_values=[0, 1, 2, 3], num_fp_thres=[5, 10, 25, 50, 100]):
+    # type_values = [0, 1, 2, 3]
+    # num_fp_thres = [5, 10, 25, 50, 100]
+
+    type_val_frac = np.zeros((len(num_fp_thres), len(type_values)))
+    for i, thres in enumerate(num_fp_thres):
+        type_counts = fp_type_series[:thres].value_counts(normalize=True, sort=True)
+        for j, val in enumerate(type_values):
+            val_check = type_counts.index.values == val
+            if np.any(val_check):
+                val_idx = np.argmax(val_check)
+                type_val_frac[i, j] = type_counts.iloc[val_idx]
+    return type_val_frac
+
+
+
+
@@ -0,0 +1,156 @@
+import pandas as pd
+import numpy as np
+from ast import literal_eval
+
+import os.path
+
+from .config import cfg
+
+from ..detection.detection_helpers import scale_detection_boxes, correct_for_shift, crop_bboxes_from_im
+
+# class to wrap annotations
+
+
+class BBoxAnnotations(object):
+
+    def __init__(self, collection_name, relative_path='../'):
+        # basic paths
+        self.data_root = cfg.DATA_TEST_DIR
+        self.num_classes = cfg.TEST.NUM_CLASSES
+        self.path_to_data_products = '{}data/annotations/'.format(relative_path)
+
+        # load collection annotations
+        self.anno_df = self.load_collection_annotations(collection_name)
+
+        if len(self.anno_df) > 0:
+            print('Load bbox annotations for {} dataset: {} found!'.format(collection_name,
+                                                                           self.anno_df.segm_idx.nunique()))
+        else:
+            print('No bbox annotations for {} dataset'.format(collection_name))
+
+    def load_collection_annotations(self, collection_name):
+        # assemble annotation file path
+        annotation_file = 'bbox_annotations_{}.csv'.format(collection_name)
+        annotation_file_path = '{}{}'.format(self.path_to_data_products, annotation_file)
+
+        # check if annotation file exists
+        if os.path.isfile(annotation_file_path):
+            # read annotation file
+            anno_df = pd.read_csv(annotation_file_path, engine='python')
+            # convert string of list to list
+            anno_df['relative_bbox'] = anno_df['relative_bbox'].apply(literal_eval)
+            anno_df['bbox'] = anno_df['bbox'].apply(literal_eval)
+            # return data frame
+            return anno_df
+        else:
+            # return empty list (check later with len(.) to see if file exists)
+            return []
+
+    def select_anno_df_by_segm_idx(self, segm_idx):
+        # wrap pandas logic
+        return self.anno_df[(self.anno_df.segm_idx == segm_idx)]
+
+    def select_anno_df_by_cdli_and_view(self, cdli, view):
+        # wrap pandas logic
+        return self.anno_df[(self.anno_df.tablet_CDLI == cdli) & (self.anno_df.view_desc == view)]
+
+
+# static functions
+def get_boxes_and_labels(anno_df):
+    # retrieves gt_boxes and gt_labels from anno_df
+    #gt_boxes = np.stack(anno_df.bbox.values)
+    if len(anno_df) > 0:
+        gt_boxes = np.stack(anno_df.relative_bbox.values)  # use relative bbox
+        gt_labels = anno_df.train_label.values
+    else:
+        gt_boxes, gt_labels = np.array([]), np.array([])  # just dummy
+    return gt_boxes, gt_labels
+
+
+def get_class_gt_boxes(gt_boxes, gt_labels, cls_id):
+    inds = np.where(gt_labels == cls_id)[0]
+    return gt_boxes[inds, :]
+
+
+def apply_scaling_and_shift(gt_boxes, scaling=1, shift=0):
+    # if used, should be applied before calling eval
+    # apply scaling of detection boxes
+    gt_boxes = scale_detection_boxes(gt_boxes, scaling)
+    # apply shift of detection boxes due to center crop
+    gt_boxes = correct_for_shift(gt_boxes, shift)
+    return gt_boxes
+
+
+def apply_scaling(gt_boxes, scaling=1):
+    # if used, should be applied before calling eval
+    # apply scaling of detection boxes
+    gt_boxes = scale_detection_boxes(gt_boxes, scaling)
+    return gt_boxes
+
+
+def collect_gt_crops(gt_boxes, gt_labels, im, num_classes, max_vis=2):
+    # takes tablet image
+    # returns list of ground truth crops organized by class
+    gt_crops = [[] for _ in xrange(num_classes)]
+    for j in xrange(1, num_classes):
+        BBGT = get_class_gt_boxes(gt_boxes, gt_labels, j).astype(float)
+        npos = BBGT.shape[0]
+        if npos > 0:
+            # get boxes
+            bboxes = BBGT[:, :4]  # remove any additional dims
+            ncrops = min(max_vis, bboxes.shape[0])
+            gt_crops[j] = crop_bboxes_from_im(im, bboxes[:ncrops, :])
+    return gt_crops
+
+
+def prepare_segment_gt(segm_idx, segm_scale, bbox_anno, with_star_crop=False):
+    # this is how things work together
+
+    # create empty lists in case no annotations available
+    gt_boxes, gt_labels = [], []
+
+    if len(bbox_anno.anno_df) > 0:
+        # select annotations for specific segment
+        sub_anno_df = bbox_anno.select_anno_df_by_segm_idx(segm_idx)
+        # get boxes and labels
+        gt_boxes, gt_labels = get_boxes_and_labels(sub_anno_df)
+        # adapt gt boxes to input format
+        if with_star_crop:
+            gt_boxes = apply_scaling_and_shift(gt_boxes, scaling=segm_scale, shift=-cfg.TEST.SHIFT / 2.)
+        else:
+            gt_boxes = apply_scaling(gt_boxes, scaling=segm_scale)
+
+    # return selected ground truth
+    return gt_boxes, gt_labels
+
+
+# def get_pred_boxes_df(all_boxes, seg_idx):
+#     # iterate list
+#     list_boxes = []
+#     list_cls_idx = []
+#     for cls, boxes in enumerate(all_boxes):
+#         num_boxes = len(boxes[0])
+#         if num_boxes > 0:
+#             list_boxes.append(boxes)
+#             list_cls_idx.extend([cls] * num_boxes)
+#     # create df
+#     pred_boxes_df = pd.DataFrame()  # []
+#     if len(list_boxes) > 0:
+#         pred_boxes_df = pd.DataFrame(np.hstack(list_boxes).reshape(-1, 5), columns=['x1', 'y1', 'x2', 'y2', 'conf'])
+#         pred_boxes_df['cls'] = list_cls_idx
+#         pred_boxes_df['seg_idx'] = seg_idx
+#
+#     return pred_boxes_df
+#
+#
+# def get_gt_boxes_df(gt_boxes, gt_labels, seg_idx):
+#     # create df
+#     gt_boxes_df = pd.DataFrame()  # []
+#     if len(gt_boxes) > 0:
+#         gt_boxes_df = pd.DataFrame(gt_boxes, columns=['x1', 'y1', 'x2', 'y2'])
+#         gt_boxes_df['cls'] = gt_labels
+#         gt_boxes_df['seg_idx'] = seg_idx
+#     return gt_boxes_df
+
+
+
@@ -0,0 +1,101 @@
+import pandas as pd
+import numpy as np
+
+
+def get_pred_boxes_df(all_boxes, seg_idx):
+    # iterate list
+    list_boxes = []
+    list_cls_idx = []
+    for cls, boxes in enumerate(all_boxes):
+        num_boxes = len(boxes[0])
+        if num_boxes > 0:
+            list_boxes.append(boxes)
+            list_cls_idx.extend([cls] * num_boxes)
+    # create df
+    pred_boxes_df = pd.DataFrame()  # []
+    if len(list_boxes) > 0:
+        pred_boxes_df = pd.DataFrame(np.hstack(list_boxes).reshape(-1, 5), columns=['x1', 'y1', 'x2', 'y2', 'conf'])
+        pred_boxes_df['cls'] = list_cls_idx
+        pred_boxes_df['seg_idx'] = seg_idx
+
+    return pred_boxes_df
+
+
+def get_gt_boxes_df(gt_boxes, gt_labels, seg_idx):
+    # create df
+    gt_boxes_df = pd.DataFrame()  # []
+    if len(gt_boxes) > 0:
+        gt_boxes_df = pd.DataFrame(gt_boxes, columns=['x1', 'y1', 'x2', 'y2'])
+        gt_boxes_df['cls'] = gt_labels
+        gt_boxes_df['seg_idx'] = seg_idx
+    return gt_boxes_df
+
+
+# SSD specific
+
+
+def convert_detections_for_eval(pred_boxes, pred_labels, pred_scores, total_labels=240):
+
+    # convert from ssd detector format to all_boxes
+    all_boxes = [[] for _ in range(total_labels)]
+
+    for boxes, labels, scores in zip(pred_boxes, pred_labels, pred_scores):
+        for bbox, lbl, score in zip(boxes, labels, scores):
+            # temp: [ID, cx, cy, score, x1, y1, x2, y2, idx]
+
+            # copy data to _new_ all_boxes
+            box = np.zeros((1, 5))
+            box[0, :4] = bbox
+            box[0, 4] = score
+            all_boxes[np.int(lbl)].append(box)
+
+    # for each class stack list of bounding boxes together
+    all_boxes = [np.stack(el).squeeze(axis=1) if len(el) > 0 else el for el in all_boxes]
+
+    return all_boxes
+
+
+def prepare_ssd_outputs_for_eval(box_preds, label_preds, score_preds, num_classes=240):
+
+    if len(box_preds) > 0:
+        # Wrap VOC evaluation for PyTorch
+        pred_boxes = [b.numpy() for b in [box_preds]]
+        pred_labels = [label.numpy() for label in [label_preds]]
+        pred_scores = [score.numpy() for score in [score_preds]]
+
+        # convert to all boxes and stack tiles (better would be to have single tile for whole segment)
+        all_boxes = convert_detections_for_eval(pred_boxes, pred_labels, pred_scores, num_classes)
+        all_boxes = [[el] for el in all_boxes]
+    else:
+        # deal with case if there are not any detections
+        all_boxes = [[] for _ in range(num_classes)]
+        all_boxes = [[el] for el in all_boxes]
+
+    return all_boxes
+
+
+def prepare_ssd_gt_for_eval(gt_boxes, gt_labels):
+    gt_boxes = [b.numpy() for b in [gt_boxes]]
+    gt_labels = [label.numpy() for label in [gt_labels]]
+
+    return gt_boxes[0], gt_labels[0]
+
+
+# alignment specific
+
+def convert_to_all_boxes(seg_gen_annos, relative_bboxes, scale, num_labels):
+    all_boxes = [[] for _ in range(num_labels)]
+
+    for anno_idx, anno_rec in seg_gen_annos.iterrows():
+        # [x1, y1, x2, y2, score]
+        box = np.zeros((1, 5))
+        box[0, :4] = np.array(relative_bboxes[anno_idx]) * scale
+        box[0, 4] = anno_rec.det_score
+        # assign to class
+        all_boxes[anno_rec.newLabel].append(box)
+
+    # for each class stack list of bounding boxes together
+    all_boxes = [np.stack(el).squeeze(axis=1) if len(el) > 0 else el for el in all_boxes]
+
+    return all_boxes
+
@@ -0,0 +1,187 @@
+import numpy as np
+import pandas as pd
+import matplotlib.pyplot as plt
+
+from .sign_evaluation_prep import (get_pred_boxes_df, get_gt_boxes_df)
+from .sign_evaluation import (eval_detector, eval_detector_on_collection, prepare_eval_df,
+                              compute_global_ap, compute_mean_ap)
+
+
+# *BASIC* EVALUATION
+
+class SignEvalBasic(object):
+    def __init__(self, model_version, collection_name, eval_ovthresh=0.5):
+
+        self.model_version = model_version
+        self.coll_name = collection_name
+
+        self.eval_ovthresh = eval_ovthresh
+
+        self.list_seg_mean_ap = []
+        self.list_df_stats = []
+        #self.list_seg_name_with_anno = []
+
+        self.list_pred_boxes_df = []
+        self.list_gt_boxes_df = []
+
+        self.gt_boxes_df = pd.DataFrame()
+        self.pred_boxes_df = pd.DataFrame()
+
+    def eval_segment(self, all_boxes, gt_boxes, gt_labels, seg_idx, verbose=True):
+        # evaluate and print stats
+        acc, df_stats = eval_detector(gt_boxes, gt_labels, all_boxes, ovthresh=self.eval_ovthresh, verbose=verbose)
+
+        # collect results
+        self.list_seg_mean_ap.append(df_stats['ap'].mean())
+        self.list_df_stats.append(df_stats)
+        #list_seg_name_with_anno.append(image_name + view_desc)
+
+        # prepare full collection evaluation
+        if type(all_boxes[0]) is np.ndarray:
+            self.list_pred_boxes_df.append(get_pred_boxes_df([el.tolist() for el in all_boxes], seg_idx))
+            self.list_gt_boxes_df.append(get_gt_boxes_df(gt_boxes, gt_labels, seg_idx))
+        else:
+            self.list_pred_boxes_df.append(get_pred_boxes_df(all_boxes, seg_idx))
+            self.list_gt_boxes_df.append(get_gt_boxes_df(gt_boxes, gt_labels, seg_idx))
+
+    def prepare_eval_collection(self):
+        self.gt_boxes_df = pd.concat(self.list_gt_boxes_df, ignore_index=True)
+        self.pred_boxes_df = pd.concat(self.list_pred_boxes_df, ignore_index=True)
+
+    def eval_collection(self, verbose=True):
+        self.prepare_eval_collection()
+        acc, df_stats = eval_detector_on_collection(self.gt_boxes_df, self.pred_boxes_df, ovthresh=self.eval_ovthresh)
+        return acc, df_stats
+
+
+# *FAST* EVALUATION
+
+class SignEvalFast(object):
+    def __init__(self, model_version, collection_name, tp_thresh=0.5, bg_thresh=0.2, num_classes=240):
+
+        self.model_version = model_version
+        self.coll_name = collection_name
+
+        self.tp_thresh = tp_thresh
+        self.bg_thresh = bg_thresh
+        self.num_classes = num_classes
+
+        self.list_seg_mean_ap = []
+
+        self.list_eval_df = []
+        self.list_gt_boxes_df = []
+        self.list_seg_global_ap = []
+
+        self.col_eval_df = pd.DataFrame()
+        self.gt_boxes_df = pd.DataFrame()
+
+    def eval_segment(self, all_boxes, gt_boxes, gt_labels, seg_idx, verbose=True):
+        # get eval_df
+        eval_df = prepare_eval_df(all_boxes, gt_boxes, gt_labels, seg_idx, self.tp_thresh, self.bg_thresh)
+        # get gt_df
+        gt_df = get_gt_boxes_df(gt_boxes, gt_labels, seg_idx)
+
+        mean_ap, global_ap, mean_ap_align = 0., 0., 0.
+        if len(eval_df) > 0 and len(gt_df[gt_df.cls > 0]) > 0:
+            # eval
+            det_stats = compute_mean_ap(eval_df, gt_df, self.num_classes, verbose=False)
+            global_ap = compute_global_ap(eval_df, gt_df, self.num_classes, verbose=False)
+            df_stats = pd.DataFrame(det_stats, columns=['num_gt', 'num_det', 'tp', 'fp', 'ap', 'lbl'])
+            mean_ap = np.mean(df_stats.ap)
+            # mean_ap_align = np.mean(df_stats.ap[df_stats.ap.nonzero()[0]])
+            mean_ap_align = np.mean(df_stats.ap[df_stats.num_det > 0])  # only consider classes with detections
+            if verbose:
+                print ('mAP {:.4} | global AP: {:.4} | mAP (align): {:.4}'.format(mean_ap, global_ap, mean_ap_align))
+                print ("total_tp: {} | total_fp: {} [{}] | acc: {:.2}".format(*get_summary(eval_df, gt_df)))
+        else:
+            if verbose:
+                print ('mAP {:.4} | global AP: {:.4} | mAP (align): {}'.format(mean_ap, global_ap, mean_ap_align))
+                print ("total_tp: {} | total_fp: {} [{}] | acc: {:.2}".format(0, 0, 0, 0.))
+
+        # append
+        self.list_seg_mean_ap.append(mean_ap)
+        self.list_seg_global_ap.append(global_ap)
+        self.list_eval_df.append(eval_df)
+        self.list_gt_boxes_df.append(gt_df)
+
+    def prepare_eval_collection(self, verbose=False):
+        if len(self.col_eval_df) == 0:
+            if len(self.list_eval_df) > 0:  # only concat if there is anything to concat
+                # concat dataframes
+                self.col_eval_df = pd.concat(self.list_eval_df)
+                self.gt_boxes_df = pd.concat(self.list_gt_boxes_df, ignore_index=True)
+
+        if verbose:
+            print(self.col_eval_df.det_type.value_counts())
+            print("num det:", len(self.col_eval_df))
+            print("num TP (without double detections):",
+                  len(self.col_eval_df[(self.col_eval_df.max_det == True)
+                                       & (self.col_eval_df.det_type == 3)]))
+
+    def eval_collection(self, verbose=True):
+        # concat dataframes
+        self.prepare_eval_collection()
+
+        global_ap = 0
+        df_stats = pd.DataFrame()
+        if len(self.gt_boxes_df) > 0:
+            # full collection eval
+            det_stats = compute_mean_ap(self.col_eval_df, self.gt_boxes_df, self.num_classes, verbose=False)
+            global_ap = compute_global_ap(self.col_eval_df, self.gt_boxes_df, self.num_classes, verbose=False)
+            df_stats = pd.DataFrame(det_stats, columns=['num_gt', 'num_det', 'tp', 'fp', 'ap', 'lbl'])
+            mean_ap = np.mean(det_stats[:, -2])
+            mean_ap_align = np.mean(df_stats.ap[df_stats.num_det > 0])  # only consider classes with detections
+            if verbose:
+                print('{} | {}'.format(self.coll_name, self.model_version))
+                print('RESULTS ON FULL COLLECTION :')
+                print ('mAP {:.4} | global AP: {:.4} | mAP (align): {:.4}'.format(mean_ap, global_ap, mean_ap_align))
+                print ("total_tp: {} | total_fp: {} [{}] | prec: {:.3}".format(*self.get_col_summary()))
+
+        return df_stats, global_ap
+
+    def eval_collection_class_freq(self, freq_classes_list):
+        # freq_classes_list: sorted list of most frequent classes (in descending order)
+
+        # concat dataframes
+        self.prepare_eval_collection()
+
+        # compute mAP for different sets of topk most frequent classes
+        topk_list = [2, 4, 8, 16, 32, 64, 128, 192, 256]
+        topk_mAP_list = []
+        for topk in topk_list:
+            print("over {} most freq classes".format(topk))
+            det_stats = compute_mean_ap(self.col_eval_df, self.gt_boxes_df, self.num_classes,
+                                       class_list=freq_classes_list[:topk])
+            mean_ap = np.mean(det_stats[:, -2])
+            topk_mAP_list.append(mean_ap)
+        # plot
+        plt.figure()
+        plt.plot(topk_list, topk_mAP_list, "o-")
+        plt.title('{} - {}'.format(self.coll_name, self.model_version))
+        plt.ylabel('mAP')
+        plt.xlabel('topk')
+        # plt.xscale('log')
+
+    def get_seg_summary(self, didx):
+        """ didx: index of segment in list of segments to evaluate """
+        num_tp, num_fp, num_fp_global, acc = get_summary(self.list_eval_df[didx], self.list_gt_boxes_df[didx])
+        mean_ap = self.list_seg_mean_ap
+        global_ap = self.list_seg_global_ap
+        return num_tp, num_fp, num_fp_global, acc, mean_ap, global_ap
+
+    def get_col_summary(self):
+        num_tp, num_fp, num_fp_global, acc = get_summary(self.col_eval_df, self.gt_boxes_df)
+        return num_tp, num_fp, num_fp_global, acc
+
+
+def get_summary(col_eval_df, gt_boxes_df):
+    if len(gt_boxes_df) > 0 and len(col_eval_df) > 0:
+        select_tp = (col_eval_df.det_type == 3) & (col_eval_df.max_det == True)
+        select_fp = (~select_tp) & col_eval_df.pred.isin(gt_boxes_df.cls.unique())
+        num_tp = select_tp.sum()
+        num_fp = select_fp.sum()
+        num_fp_global = (~select_tp).sum()
+        return num_tp, num_fp, num_fp_global, num_tp / float(num_tp + num_fp)
+    else:
+        return 0, 0, 0, 0.
+
@@ -0,0 +1,129 @@
+import numpy as np
+import pandas as pd
+
+import editdistance
+import Levenshtein
+from nltk.translate.bleu_score import sentence_bleu
+from nltk.translate.bleu_score import SmoothingFunction
+
+from .sign_evaluation import evaluate_on_gt
+
+# deprecated (should be handled by sign_evaluator)
+def get_eval_stats(gt_boxes, gt_labels, aligned_list):
+    # evaluate
+    num_imgs = 1
+    all_tp, all_fp, det_stats, total_num_tp = evaluate_on_gt(gt_boxes, gt_labels,
+                                                             num_imgs, [[el] for el in aligned_list])
+
+    total_num_fp = int(np.sum(np.array(det_stats)[:, 3]))
+    # print stats
+    pd.set_option('display.max_rows', 50)
+    df_stats = pd.DataFrame(det_stats, columns=['num_gt', 'num_det', 'tp', 'fp', 'ap', 'lbl'])
+
+    print("total_tp", total_num_tp, "total_fp", total_num_fp,
+          # here precision = accuarcy
+          "acc", '{:0.2f}'.format(total_num_tp / float(total_num_tp + total_num_fp)),
+          "mAP", '{:0.4f}'.format(df_stats['ap'].mean()),
+          "mAP(nonzero)", '{:0.4f}'.format(df_stats['ap'].iloc[df_stats['ap'].nonzero()[0]].mean()))
+    acc = total_num_tp / float(total_num_tp + total_num_fp)
+
+    return acc, df_stats
+
+# deprecated (should be handled by sign_evaluator)
+def compute_accuracy(gt_boxes, gt_labels, aligned_list, return_stats=False):
+    # only run if gt available
+    if len(gt_boxes) > 0:
+        acc, df_stats = get_eval_stats(gt_boxes, gt_labels, aligned_list)
+        if return_stats:
+            return acc, df_stats
+        else:
+            return acc
+    else:
+        return -1
+
+
+def convert_alignments_for_eval(detections, total_labels=240):
+    # convert from RANSAC format (Nx9) to all_boxes
+    all_boxes = [[] for _ in range(total_labels)]
+
+    for temp in detections:
+        # temp: [ID, cx, cy, score, x1, y1, x2, y2, idx]
+
+        # copy data to _new_ all_boxes
+        box = np.zeros((1, 5))
+        box[0, :4] = temp[4:8]
+        box[0, 4] = temp[3]
+        all_boxes[np.int(temp[0])].append(box)
+
+    # for each class stack list of bounding boxes together
+    all_boxes = [np.stack(el).squeeze(axis=1) if len(el) > 0 else el for el in all_boxes]
+
+    return all_boxes
+
+
+#  SCORE FUNCTIONS
+
+
+def compute_bleu_score(candidate_words, reference_words):
+    reference = [reference_words]
+    candidate = candidate_words
+    # compute score
+
+    # deal with issue
+    # https://github.com/nltk/nltk/issues/1554
+    hyp_lengths = len(reference_words)
+    weights = (0.25, 0.25, 0.25, 0.25)
+    if hyp_lengths < 4:
+        if hyp_lengths == 0:
+            weights = (0, )
+        else:
+            weights = (1 / float(hyp_lengths), ) * hyp_lengths
+
+    chencherry = SmoothingFunction()
+    score = sentence_bleu(reference, candidate, weights=weights, smoothing_function=chencherry.method1)
+    return score
+
+
+def compute_levenshtein(candidate, reference, normalize=True):
+    edist = 0
+    if len(reference) > 0:
+        # strict normalization in [0,1] range
+        edist = editdistance.eval(reference, candidate)
+        if normalize:
+            edist = float(edist) / max(len(reference), len(candidate))
+    return edist
+
+
+def compute_cer(candidate, reference):
+    # character error rate (see also WER)
+    # character accuracy 1 - CER
+    edist = 0
+    if len(reference) > 0:
+        edist = editdistance.eval(reference, candidate)
+        edist = float(edist) / len(reference)
+    return edist
+
+
+def compute_levenshtein_ops(candidate, reference, normalize=True):
+    # https://rawgit.com/ztane/python-Levenshtein/master/docs/Levenshtein.html
+    ops_dict = {'insert': 0, 'delete': 1, 'replace': 2}
+    # print candidate, reference
+    edist = 0
+    edit_ops = np.zeros(len(ops_dict))
+    if len(reference) > 0:
+        # convert to string for Levenshtein function
+        candidate_str = u''.join([unichr(lbl) for lbl in candidate])
+        reference_str = u''.join([unichr(lbl) for lbl in reference])
+        # compute ed ops
+        ops_df = pd.DataFrame(Levenshtein.editops(candidate_str, reference_str), columns=['type', 'ixA', 'ixB'])
+        edist = len(ops_df)
+        # collect types
+        for op, ii in ops_dict.iteritems():
+            edit_ops[ii] = len(ops_df[ops_df.type == op])
+        if normalize:
+            edist = float(edist) / max(len(reference), len(candidate))
+
+    return edist, edit_ops
+
+
+
@@ -0,0 +1,7 @@
+### Network architectures 
+
+- `linenet.py` : a modified AlexNet used for line segmentation
+- `mobilenetv2_mod03.py` : a modified MobileNetV2 used as backbone for the sign detector
+- `mobilenetv2_fpn.py` : a FPN network wrapper for the backbone architecture
+- `trained_model_loader.py` : contains functions to load sign detector and line segmentation models; 
+describes how detectors are assembled from parts.
@@ -0,0 +1,192 @@
+import torch
+import torch.nn as nn
+import torch.nn.init as init
+import torch.nn.functional as F
+
+
+# HELPER FUNCTIONS
+
+def initialize_weights(model):
+    for m in model.modules():
+        if isinstance(m, nn.Conv2d):
+            # n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
+            # m.weight.data.normal_(0, math.sqrt(2. / n))
+            # init.xavier_normal(m.weight.data)
+            # init.kaiming_normal(m.weight.data)
+            init.normal_(m.weight.data, std=0.01)
+            # check if bias = True
+            if hasattr(m.bias, 'data'):
+                m.bias.data.zero_()
+        elif isinstance(m, nn.Linear):
+            m.weight.data.normal_(0, 0.005)
+            m.bias.data.zero_()
+        elif isinstance(m, nn.BatchNorm2d):
+            # check if affine = True
+            if hasattr(m.bias, 'data'):
+                m.weight.data.fill_(1)
+                m.bias.data.zero_()
+
+
+def copy_layer_params(target, source):
+    """ Copy layer parameters from source to target; size of arrays needs to match! """
+    target.weight.data.copy_(source.weight.data.view(target.weight.size()))
+    target.bias.data.copy_(source.bias.data.view(target.bias.size()))
+
+
+# HELPER MODULES
+
+
+class LRN(nn.Module):
+    def __init__(self, local_size=1, alpha=1.0, beta=0.75, ACROSS_CHANNELS=False):
+        super(LRN, self).__init__()
+        self.ACROSS_CHANNELS = ACROSS_CHANNELS
+        if self.ACROSS_CHANNELS:
+            # make it work with pytorch 0.2.X # hacky!!! should be ConstantPadding
+            # self.average = nn.Sequential(
+            #     nn.ReplicationPad3d(padding=(0, 0, 0, 0, int((local_size - 1.0) / 2), int((local_size - 1.0) / 2))),
+            #     nn.AvgPool3d(kernel_size=(local_size, 1, 1), stride=1),
+            # )
+            self.average = nn.AvgPool3d(kernel_size=(local_size, 1, 1),
+                                        stride=1,
+                                        padding=(int((local_size - 1.0) / 2), 0, 0))
+        else:
+            self.average = nn.AvgPool2d(kernel_size=local_size,
+                                        stride=1,
+                                        padding=int((local_size - 1.0) / 2))
+        self.alpha = alpha
+        self.beta = beta
+
+    def forward(self, x):
+        if self.ACROSS_CHANNELS:
+            div = x.pow(2).unsqueeze(1)
+            div = self.average(div).squeeze(1)
+            div = div.mul(self.alpha).add(1.0).pow(self.beta)
+        else:
+            div = x.pow(2)
+            div = self.average(div)
+            div = div.mul(self.alpha).add(1.0).pow(self.beta)
+        x = x.div(div)
+        return x
+
+
+class Softmax3D(nn.Module):
+    def forward(self, input_):
+        batch_size = input_.size()[0]
+        output_ = torch.stack([F.softmax(input_[i]) for i in range(batch_size)], 0)
+        return output_
+
+
+# MAIN MODULES
+
+
+class LineNet(nn.Module):
+
+    def __init__(self, num_classes=1000, input_channels=3):
+        super(LineNet, self).__init__()
+        self.features = nn.Sequential(
+            nn.Conv2d(input_channels, 64, kernel_size=11, stride=4, padding=0),
+            nn.ReLU(inplace=True),
+            nn.MaxPool2d(kernel_size=3, stride=2),
+            LRN(alpha=1e-4, beta=0.75, local_size=1),
+            nn.Conv2d(64, 256, kernel_size=5, padding=2),
+            nn.ReLU(inplace=True),
+            nn.MaxPool2d(kernel_size=3, stride=2),
+            LRN(alpha=1e-4, beta=0.75, local_size=1),
+            nn.Conv2d(256, 384, kernel_size=3, padding=1),
+            nn.BatchNorm2d(384, affine=False, momentum=.1),
+            nn.ReLU(inplace=True),
+            nn.Conv2d(384, 384, kernel_size=3, padding=1),
+            nn.BatchNorm2d(384, affine=False, momentum=.1),
+            nn.ReLU(inplace=True),
+            nn.Conv2d(384, 256, kernel_size=3, padding=1),
+            nn.BatchNorm2d(256, affine=False, momentum=.1),
+            nn.ReLU(inplace=True),
+            nn.MaxPool2d(kernel_size=3, stride=2),
+        )
+        self.fc6 = nn.Linear(256 * 6 * 6, 512)
+        self.score = nn.Linear(512, 240)
+        self.classifier = nn.Sequential(
+            self.fc6,
+            nn.ReLU(inplace=True),
+            nn.Dropout(),
+            self.score,
+        )
+        self.line_score = nn.Linear(512, num_classes)
+        self.line_classifier = nn.Sequential(
+            self.fc6,
+            nn.ReLU(inplace=True),
+            nn.Dropout(),
+            self.line_score,
+        )
+        initialize_weights(self)
+
+    def legacy_forward(self, x):
+        x = self.features(x)
+        x = x.view(x.size(0), 256 * 6 * 6)  # x = x.view(x.size(0), -1)
+        x = self.classifier(x)
+
+        return x
+
+    def forward(self, x):
+        x = self.features(x)
+        x = x.view(x.size(0), 256 * 6 * 6)  # x = x.view(x.size(0), -1)
+        x = self.line_classifier(x)
+
+        return x
+
+
+class LineNetFCN(nn.Module):
+    def __init__(self, original_model, num_classes=240):
+        super(LineNetFCN, self).__init__()
+
+        # simple assign !no copy! (could use copy.deepcopy(), but assume original_model is not used anymore)
+        self.features = original_model.features
+
+        # create new module and assign features
+        # original_net = CuneiNet(input_channels=1)
+        # self.features = original_net.features
+        # self.features.load_state_dict(original_model.features.state_dict())
+
+        # softmax function
+        self.softmax = nn.Softmax2d()    ## Softmax3D(),
+
+        # create fcn head
+        self.classifier = nn.Sequential(
+            nn.Conv2d(256, 512, kernel_size=6, padding=0),
+            nn.ReLU(inplace=True),
+            # nn.Dropout(), DO NOT USE 1d dropout!!!
+            nn.Dropout2d(),
+            nn.Conv2d(512, num_classes, kernel_size=1, padding=0),
+            # self.softmax  # not here to
+        )
+
+        # perform net surgery
+        self.net_surgery(original_model)
+
+    def forward(self, x):
+        x = self.features(x)
+        x = self.classifier(x)
+        x = self.softmax(x)
+        # batch_size = x.size()[0]
+        # x = torch.stack([F.softmax(x[i]) for i in range(batch_size)], 0)
+        return x
+
+    def get_conv_features(self, x):
+        x = self.features(x)
+        return x
+
+    def get_fc_features(self, x):
+        x = self.features(x)
+        x = self.classifier(x)
+        return x
+
+    def net_surgery(self, original_model):
+        """ perform net surgery
+                original.classifier --> fcn.classifier
+        """
+        for i, l1 in enumerate(original_model.line_classifier):
+            if isinstance(l1, nn.Linear):
+                l2 = self.classifier[i]
+                # l2.weight.data.copy_(l1.weight.data.view(l2.weight.size()))
+                # l2.bias.data.copy_(l1.bias.data.view(l2.bias.size()))
+                copy_layer_params(l2, l1)
@@ -0,0 +1,93 @@
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+import math
+
+
+class MobileNetV2FPN(nn.Module):
+    def __init__(self, original_model, num_classes=240, width_mult=1, with_p4=False):
+        super(MobileNetV2FPN, self).__init__()
+
+        # simple assign !no copy! (could use copy.deepcopy(), but assume original_model is not used anymore)
+        self.features = original_model.features
+
+        self.conv6 = nn.Conv2d(512, 256, kernel_size=3, stride=2, padding=1)
+
+        # Top-down layers
+        self.toplayer = nn.Conv2d(512, 256, kernel_size=1, stride=1, padding=0)
+
+        self.with_p4 = with_p4
+        if self.with_p4:
+            # Lateral layers
+            self.latlayer1 = nn.Conv2d(int(32*width_mult), 256, kernel_size=1, stride=1, padding=0)
+
+            # Smooth layers
+            self.smooth1 = nn.Conv2d(256, 256, kernel_size=3, stride=1, padding=1)
+
+        # init weights (exclude features) TODO
+        self._initialize_weights(['conv6', 'toplayer', 'latlayer1', 'smooth1'])
+
+    def forward(self, x):
+        for i in range(3):
+            x = self.features[i](x)
+        c4 = self.features[3](x)  # 14 x 32*width_mult (expansion factor does not affect output of block)
+
+        x = self.features[4](c4)
+        x = self.features[5](x)
+        x = self.features[6](x)  # 7 x 160*width_mult
+        c5 = self.features[7](x)  # 7 x 512
+        p6 = self.conv6(c5)
+
+        # Top-down
+        p5 = self.toplayer(c5)
+
+        if self.with_p4:
+            p4 = self._upsample_add(p5, self.latlayer1(c4))
+            p4 = self.smooth1(p4)
+            return p4, p5, p6
+        else:
+            return p5, p6
+
+    def _upsample_add(self, x, y):
+        '''Upsample and add two feature maps.
+
+        Args:
+          x: (Variable) top feature map to be upsampled.
+          y: (Variable) lateral feature map.
+
+        Returns:
+          (Variable) added feature map.
+
+        Note in PyTorch, when input size is odd, the upsampled feature map
+        with `F.upsample(..., scale_factor=2, mode='nearest')`
+        maybe not equal to the lateral feature map size.
+
+        e.g.
+        original input size: [N,_,15,15] ->
+        conv2d feature map size: [N,_,8,8] ->
+        upsampled feature map size: [N,_,16,16]
+
+        So we choose bilinear upsample which supports arbitrary output sizes.
+        '''
+        _, _, H, W = y.size()
+        return F.upsample(x, size=(H, W), mode='bilinear', align_corners=False) + y
+
+    def _initialize_weights(self, name_list):
+        for name, m in self.named_modules():
+            # only init modules in name_list
+            if name in name_list:
+                # exclude self.features, Mobile_blocks
+                if isinstance(m, nn.Conv2d):
+                    n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
+                    m.weight.data.normal_(0, math.sqrt(2. / n))
+                    if m.bias is not None:
+                        m.bias.data.zero_()
+                elif isinstance(m, nn.BatchNorm2d):
+                    m.weight.data.fill_(1)
+                    m.bias.data.zero_()
+                elif isinstance(m, nn.Linear):
+                    n = m.weight.size(1)
+                    m.weight.data.normal_(0, 0.01)
+                    m.bias.data.zero_()
+
+
@@ -0,0 +1,160 @@
+import torch.nn as nn
+import math
+
+
+def conv_bn(inp, oup, stride):
+    return nn.Sequential(
+        nn.Conv2d(inp, oup, 3, stride, 1, bias=False),
+        nn.BatchNorm2d(oup),
+        nn.ReLU6(inplace=True)
+    )
+
+
+def conv_1x1_bn(inp, oup):
+    return nn.Sequential(
+        nn.Conv2d(inp, oup, 1, 1, 0, bias=False),
+        nn.BatchNorm2d(oup),
+        nn.ReLU6(inplace=True)
+    )
+
+
+class InvertedResidual(nn.Module):
+    def __init__(self, inp, oup, stride, expand_ratio):
+        super(InvertedResidual, self).__init__()
+        self.stride = stride
+        assert stride in [1, 2]
+
+        self.use_res_connect = self.stride == 1 and inp == oup
+
+        self.conv = nn.Sequential(
+            # pw
+            nn.Conv2d(inp, inp * expand_ratio, 1, 1, 0, bias=False),
+            nn.BatchNorm2d(inp * expand_ratio),
+            nn.ReLU6(inplace=True),
+            # dw
+            nn.Conv2d(inp * expand_ratio, inp * expand_ratio, 3, stride, 1, groups=inp * expand_ratio, bias=False),
+            nn.BatchNorm2d(inp * expand_ratio),
+            nn.ReLU6(inplace=True),
+            # pw-linear
+            nn.Conv2d(inp * expand_ratio, oup, 1, 1, 0, bias=False),
+            nn.BatchNorm2d(oup),
+        )
+
+    def forward(self, x):
+        if self.use_res_connect:
+            return x + self.conv(x)
+        else:
+            return self.conv(x)
+
+
+class MobileBlock(nn.Module):
+    # introduced to simplify creation of FPN
+    def __init__(self, residual_setting, input_channel, output_channel):
+        super(MobileBlock, self).__init__()
+        t, c, n, s = residual_setting
+
+        block_seq = []
+        for i in range(n):
+            if i == 0:
+                block_seq.append(InvertedResidual(input_channel, output_channel, s, t))
+            else:
+                block_seq.append(InvertedResidual(input_channel, output_channel, 1, t))
+            input_channel = output_channel
+        self.mobile_block = nn.Sequential(*block_seq)
+        self.output_channel = output_channel
+
+    def forward(self, x):
+        return self.mobile_block(x)
+
+
+class MobileNetV2(nn.Module):
+    def __init__(self, n_class=1000, input_size=224, input_dim=1, width_mult=1., arch_opt=1):
+        super(MobileNetV2, self).__init__()
+        # setting of inverted residual blocks
+        self.interverted_residual_setting = [
+            # t, c, n, s
+            [1, 16, 1, 2],
+            #[1, 16, 1, 1],
+            [6, 24, 2, 2],
+            [6, 32, 3, 2],
+            [6, 64, 4, 2],
+            [6, 96, 3, 1],
+            [6, 160, 3, 1],
+            # [6, 320, 1, 1],
+        ]
+
+        # set arch option
+        self.arch_opt = arch_opt
+
+        # building first layer
+        assert input_size % 32 == 0
+        input_channel = int(32 * width_mult)
+
+        # self.last_channel = int(1280 * width_mult) if width_mult > 1.0 else 1280
+        if self.arch_opt == 1:
+            self.last_channel = int(512 * width_mult) if width_mult > 1.0 else 512
+        elif self.arch_opt == 2:
+            self.last_channel = int(256 * width_mult) if width_mult > 1.0 else 256
+
+        self.features = [conv_bn(input_dim, input_channel, 2)]
+        # building inverted residual blocks
+        for ii, residual_setting in enumerate(self.interverted_residual_setting):
+            t, c, n, s = residual_setting
+            output_channel = int(c * width_mult)
+            new_block = MobileBlock(residual_setting, input_channel, output_channel)
+            self.features.append(new_block)
+            input_channel = new_block.output_channel
+
+        # building last several layers
+        self.features.append(conv_1x1_bn(input_channel, self.last_channel))
+
+        if self.arch_opt == 1:
+            self.features.append(nn.AvgPool2d(input_size / 32, stride=1))
+            # need stride=1 for FCN (because default stride is kernel_sz)
+            # self.features.append(nn.MaxPool2d(kernel_size=input_size / 32, stride=1))
+
+            # building classifier
+            self.classifier = nn.Sequential(
+                nn.Dropout(),
+                nn.Linear(self.last_channel, n_class),
+            )
+
+        elif self.arch_opt == 2:
+            # building classifier
+            self.classifier = nn.Sequential(
+                nn.Linear(self.last_channel * 7 * 7, 384),
+                nn.ReLU(inplace=True),
+                nn.Dropout(),
+                nn.Linear(384, n_class),
+            )
+
+        # make it nn.Sequential
+        self.features = nn.Sequential(*self.features)
+
+        self._initialize_weights()
+
+    def forward(self, x):
+        for layer in self.features:
+            x = layer(x)
+
+        # x = x.view(-1, self.last_channel)
+        x = x.view(x.size(0), -1)
+        x = self.classifier(x)
+        return x
+
+    def _initialize_weights(self):
+        for m in self.modules():
+            if isinstance(m, nn.Conv2d):
+                n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
+                m.weight.data.normal_(0, math.sqrt(2. / n))
+                if m.bias is not None:
+                    m.bias.data.zero_()
+            elif isinstance(m, nn.BatchNorm2d):
+                m.weight.data.fill_(1)
+                m.bias.data.zero_()
+            elif isinstance(m, nn.Linear):
+                n = m.weight.size(1)
+                m.weight.data.normal_(0, 0.01)
+                m.bias.data.zero_()
+
+
@@ -0,0 +1,99 @@
+import torch
+import os
+
+from .linenet import LineNet, LineNetFCN
+
+from .mobilenetv2_mod03 import MobileNetV2
+from .mobilenetv2_fpn import MobileNetV2FPN
+
+from ..utils.torchcv.models.net import FPNSSD
+from ..utils.torchcv.models.rpn_net import RPN
+
+
+def get_cunei_net_basic(model_version, device, arch_type, arch_opt=1, width_mult=0.5,
+                       relative_path='../../', num_classes=240, num_c=1):
+
+    # create classifier model
+    basic_net = MobileNetV2(input_size=224, width_mult=width_mult, n_class=num_classes, input_dim=num_c,
+                                arch_opt=arch_opt)
+
+    # load pretrained weights
+    weights_path = '{}results/weights/cuneiNet_basic_{}.pth'.format(relative_path, model_version)
+    basic_net.load_state_dict(torch.load(weights_path))  # , strict=False
+
+    # deploy to device and switch to train
+    basic_net.to(device)
+    basic_net.eval()  # ATTENTION!
+
+    return basic_net
+
+
+def get_line_net_fcn(model_version, device, relative_path='../../', num_classes=2, num_c=1):
+
+    # choose model filename
+    weights_path = '{}results/weights/lineNet_basic_{}.pth'.format(relative_path, model_version)
+    assert os.path.exists(weights_path), "File '{}' not found!".format(weights_path)
+
+    # load model definition
+    model_ft = LineNet(num_classes=num_classes, input_channels=num_c)
+
+    # load model weights
+    model_ft.load_state_dict(torch.load(weights_path), strict=False)
+
+    # create fully-convolutional version (convolutionalize)
+    model_fcn = LineNetFCN(model_ft, num_classes)
+
+    # deploy model to device
+    model_fcn = model_fcn.to(device)
+
+    # switch model to evaluation mode
+    # model_fcn.train(False)
+    model_fcn.eval()
+
+    return model_fcn
+
+
+def get_fpn_ssd_net(model_version, device, arch_type, with_64, arch_opt=1, width_mult=0.5,
+                    relative_path='../../', num_classes=240, num_c=1, rnd_init_model=False):
+    # create classifier model
+    basic_net = MobileNetV2(input_size=224, width_mult=width_mult, n_class=num_classes, input_dim=num_c,
+                            arch_opt=arch_opt)
+
+    # create FPN model with classifier model
+    fpn_net = MobileNetV2FPN(basic_net, num_classes=num_classes, width_mult=width_mult, with_p4=with_64)
+
+    # load full detector net
+    fpnssd_net = FPNSSD(fpn_net, num_classes)
+    if not rnd_init_model:
+        # load pretrained weights
+        weights_path = '{}results/weights/fpn_net_{}.pth'.format(relative_path, model_version)
+        fpnssd_net.load_state_dict(torch.load(weights_path, map_location=device))  # , strict=False
+
+    # deploy to device and switch to train
+    fpnssd_net.to(device)
+    fpnssd_net.eval()
+
+    return fpnssd_net
+
+
+def get_rpn_net(model_version, device, arch_type, with_64, arch_opt=1, width_mult=0.5,
+                    relative_path='../../', num_classes=240, num_c=1):
+    # create classifier model
+    basic_net = MobileNetV2(input_size=224, width_mult=width_mult, n_class=num_classes, input_dim=num_c,
+                            arch_opt=arch_opt)
+
+    # create FPN model with classifier model
+    fpn_net = MobileNetV2FPN(basic_net, num_classes=num_classes, width_mult=width_mult, with_p4=with_64)
+
+    # load full detector net
+    rpn_net = RPN(fpn_net, num_classes, with_64)
+    # load pretrained weights
+    weights_path = '{}results/weights/fpn_net_{}.pth'.format(relative_path, model_version)
+    rpn_net.load_state_dict(torch.load(weights_path))  # , strict=False
+
+    # deploy to device and switch to train
+    rpn_net.to(device)
+    rpn_net.eval()
+
+    return rpn_net
+
@@ -0,0 +1,32 @@
+# -*- coding: utf-8 -*-
+"""
+@author: tdencker
+"""
+
+import pandas as pd
+
+
+class SignsStats(object):
+
+    def __init__(self, tblSignHeight=128, stats_csv_file='../../data/unicode_sign_stats.csv'):
+        self.tblSignHeight = tblSignHeight
+        self.sign_df = None
+        # load stats file
+        self.load_stats_from_file(stats_csv_file)
+
+    def load_stats_from_file(self, stats_csv_file):
+        # Load sign stats
+        sign_df = pd.read_csv(stats_csv_file)
+        sign_df = sign_df.set_index('train_lbl')
+        # assign
+        self.sign_df = sign_df
+
+    def get_sign_width(self, train_lbl, sign_width=None):
+        """ Return width of sign from stats """
+        # check default sign width
+        if sign_width is None:
+            sign_width = self.tblSignHeight
+        if train_lbl in self.sign_df.index:
+            sign_width = self.sign_df.width.loc[train_lbl]
+        return sign_width
+
@@ -0,0 +1,51 @@
+import pandas as pd
+from ..utils.path_utils import *
+
+
+class TransliterationSet:
+
+    def __init__(self, collections=[], relative_path='../../'):
+        # load list of coll_tl_df
+        list_coll_tl_df = []
+        for collection in collections:
+            coll_tl_file = '{}data/transliterations/transliterations_{}.csv'.format(relative_path, collection)
+            # check if transliteration exists
+            if os.path.isfile(coll_tl_file):
+                print('Transliteration file {} found!'.format(coll_tl_file))
+                # load transliteration
+                coll_tl_df = pd.read_csv(coll_tl_file)
+                # select subset of columns
+                coll_tl_df = coll_tl_df[['segm_idx', 'tablet_CDLI', 'train_label', 'mzl_label', 'line_idx', 'pos_idx', 'status']]
+                coll_tl_df['lbl'] = coll_tl_df['train_label']
+                coll_tl_df['mzl_lbl'] = coll_tl_df['mzl_label']
+            else:
+                print('Transliteration file {} NOT found!'.format(coll_tl_file))
+                coll_tl_df = pd.DataFrame()
+            # append coll_tl_df to list
+            list_coll_tl_df.append(coll_tl_df)
+        # make accessible
+        self.collections = collections
+        self.list_coll_tl_df = list_coll_tl_df
+
+    def get_tl_df(self, seg_rec, verbose=True):
+        # init empty tl
+        num_lines = 0
+        tl_df = pd.DataFrame()
+        # select corresponding coll_tl_df
+        collection = seg_rec.collection
+        coll_idx = self.collections.index(collection)
+        coll_tl_df = self.list_coll_tl_df[coll_idx]
+        # check if transliterations available
+        if len(coll_tl_df) > 0:
+            # select corresponding tl_df slice in coll_df
+            tl_df = coll_tl_df[coll_tl_df.segm_idx == seg_rec.name]
+            # compute number lines
+            num_lines = tl_df.line_idx.nunique()
+        # report if transliteration is missing
+        if len(tl_df) == 0:
+            if verbose:
+                print('No transliteration found for {}!'.format(seg_rec.tablet_CDLI))
+        return tl_df, num_lines
+
+
+
@@ -0,0 +1,57 @@
+import pandas as pd
+
+
+def load_cunei_mzl_df(path_to_csv='./cunei_mzl.csv', filter=False):
+    cunei_mzl_df = pd.read_csv(path_to_csv, index_col=0)
+    # avoid mzl idx without codepoint
+    cunei_mzl_df = cunei_mzl_df[cunei_mzl_df.num_cpts > 0]
+    # deal with multiple versions
+    #cunei_mzl_df = cunei_mzl_df.groupby('MesZL', sort=False, as_index=False).first()
+    # create composite sign
+    cunei_mzl_df['comp_script'] = cunei_mzl_df[['script_0', 'script_1', 'script_2']].fillna('').apply(
+        lambda x: ''.join(x), axis=1)
+    # decode to unicode (for matching with oracc utf8)
+    cunei_mzl_df.comp_script = cunei_mzl_df.comp_script.apply(lambda x: x.decode('utf8'))
+
+    if filter:
+        # avoid mzl idx without codepoint
+        cunei_mzl_df = cunei_mzl_df[cunei_mzl_df.num_cpts > 0]
+        # deal with multiple versions
+        cunei_mzl_df = cunei_mzl_df.groupby('MesZL', sort=False, as_index=False).first()
+
+    return cunei_mzl_df
+
+
+# def get_unicode(mzl_idx, cunei_mzl_df):
+#     select_mzl_idx = cunei_mzl_df.MesZL.isin([mzl_idx])
+#     if select_mzl_idx.any():
+#         cpt_hex = cunei_mzl_df.codepoint_0[select_mzl_idx].str[2:].values[0]  # get hex
+#         cpt_int = int(cpt_hex, 16)  # convert to int
+#         return unichr(cpt_int)
+#     else:
+#         return mzl_idx
+
+
+def get_unicode_comp(mzl_idx, cunei_mzl_df):
+    # also handle composite signs by concatenation
+    select_mzl_idx = cunei_mzl_df.MesZL.isin([mzl_idx])
+    if select_mzl_idx.any():
+        cunei_rec = cunei_mzl_df[select_mzl_idx]
+        out_str = ''
+        for i in range(cunei_rec.num_cpts):
+            cpt_hex = cunei_rec['codepoint_{}'.format(i)].str[2:].values[0]  # get hex
+            cpt_int = int(cpt_hex, 16)  # convert to int
+            out_str += unichr(cpt_int)
+        return out_str
+    else:
+        return mzl_idx
+
+
+def get_sign_name(mzl_idx, cunei_mzl_df):
+    select_mzl_idx = cunei_mzl_df.MesZL.isin([mzl_idx])
+    if select_mzl_idx.any():
+        cunei_rec = cunei_mzl_df[select_mzl_idx]
+        #return cunei_rec['Sign Name'].str.decode('utf8').item()
+        return cunei_rec['Sign Name'].str.decode('utf8').str.split('(').str[0].item()
+    else:
+        return mzl_idx
@@ -0,0 +1,28 @@
+import json
+import numpy as np
+
+
+def get_label_list(path_to_lbl_file='../../data/newLabels.json'):
+    # get list that maps old -> new
+
+    # load label list
+    with open(path_to_lbl_file) as json_data:
+        lbl_list = json.load(json_data)
+    return lbl_list
+
+
+def get_lbl2lbl(path_to_lbl_file):
+    # get list that maps new -> old
+    # actually using lbl_list with index function works as well !
+
+    # load label list
+    lbl_list = np.asarray(get_label_list(path_to_lbl_file))
+    # print np.unique(lbl_list)
+    # reverse (assume mapping is unique)
+    lbl2lbl = np.zeros(len(np.unique(lbl_list)), )  # 240
+    for (i, val) in enumerate(lbl_list):
+        lbl2lbl[val] = i  # new -> old
+    # since mapping is not unique for 0, need to set manually to background
+    lbl2lbl[0] = 0
+    return lbl2lbl
+
@@ -0,0 +1,200 @@
+# --------------------------------------------------------
+# Parts of code adapted from Ross Girshick's Fast/er R-CNN code
+# --------------------------------------------------------
+
+import numpy as np
+
+
+def unique_boxes(boxes, scale=1.0):
+    """Return indices of unique boxes."""
+    v = np.array([1, 1e3, 1e6, 1e9])
+    hashes = np.round(boxes * scale).dot(v)
+    _, index = np.unique(hashes, return_index=True)
+    return np.sort(index)
+
+
+def xywh_to_xyxy(boxes):
+    """Convert [x y w h] box format to [x1 y1 x2 y2] format."""
+    return np.hstack((boxes[:, 0:2], boxes[:, 0:2] + boxes[:, 2:4] - 1))
+
+
+def xyxy_to_xywh(boxes):
+    """Convert [x1 y1 x2 y2] box format to [x y w h] format."""
+    return np.hstack((boxes[:, 0:2], boxes[:, 2:4] - boxes[:, 0:2] + 1))
+
+
+def convert_bbox_global2local(gbbox, seg_bbox):
+    relative_bbox = np.array(gbbox) - np.array(seg_bbox[:2] * 2)
+    return relative_bbox.tolist()
+
+
+def convert_bbox_local2global(lbbox, seg_bbox):
+    global_bbox = np.array(lbbox) + np.array(seg_bbox[:2] * 2)
+    return global_bbox.tolist()
+
+
+def validate_boxes(boxes, width=0, height=0):
+    """Check that a set of boxes are valid."""
+    x1 = boxes[:, 0]
+    y1 = boxes[:, 1]
+    x2 = boxes[:, 2]
+    y2 = boxes[:, 3]
+    assert (x1 >= 0).all()
+    assert (y1 >= 0).all()
+    assert (x2 >= x1).all()
+    assert (y2 >= y1).all()
+    assert (x2 < width).all()
+    assert (y2 < height).all()
+
+
+def filter_small_boxes(boxes, min_size):
+    w = boxes[:, 2] - boxes[:, 0]
+    h = boxes[:, 3] - boxes[:, 1]
+    keep = np.where((w >= min_size) & (h > min_size))[0]
+    return keep
+
+
+def clip_boxes(boxes, im_shape):
+    """
+    Clip boxes to image boundaries.
+    usage for single: bb_new = clip_boxes(bb[np.newaxis, :], [imw, imh]).squeeze()
+    """
+
+    # x1 >= 0
+    boxes[:, 0::4] = np.maximum(np.minimum(boxes[:, 0::4], im_shape[1] - 1), 0)
+    # y1 >= 0
+    boxes[:, 1::4] = np.maximum(np.minimum(boxes[:, 1::4], im_shape[0] - 1), 0)
+    # x2 < im_shape[1]
+    boxes[:, 2::4] = np.maximum(np.minimum(boxes[:, 2::4], im_shape[1] - 1), 0)
+    # y2 < im_shape[0]
+    boxes[:, 3::4] = np.maximum(np.minimum(boxes[:, 3::4], im_shape[0] - 1), 0)
+    return boxes
+
+
+# def clip_boxes(boxes, im_shape):
+#     """Clip boxes to image boundaries."""
+#     # x1 >= 0
+#     boxes[:, 0::4] = np.maximum(boxes[:, 0::4], 0)
+#     # y1 >= 0
+#     boxes[:, 1::4] = np.maximum(boxes[:, 1::4], 0)
+#     # x2 < im_shape[1]
+#     boxes[:, 2::4] = np.minimum(boxes[:, 2::4], im_shape[1] - 1)
+#     # y2 < im_shape[0]
+#     boxes[:, 3::4] = np.minimum(boxes[:, 3::4], im_shape[0] - 1)
+#     return boxes
+
+
+def intersection_over_union(Reframe, GTframe):
+    # by Oemer
+
+    x1 = Reframe[0]
+    y1 = Reframe[1]
+    width1 = Reframe[2] - Reframe[0]
+    height1 = Reframe[3] - Reframe[1]
+
+    x2 = GTframe[0]
+    y2 = GTframe[1]
+    width2 = GTframe[2] - GTframe[0]
+    height2 = GTframe[3] - GTframe[1]
+
+    endx = max(x1 + width1, x2 + width2)
+    startx = min(x1, x2)
+    width = width1 + width2 - (endx - startx)
+
+    endy = max(y1 + height1, y2 + height2)
+    starty = min(y1, y2)
+    height = height1 + height2 - (endy - starty)
+
+    if width <= 0 or height <= 0:
+        ratio = 0
+    else:
+        Area = width * height
+        Area1 = width1 * height1
+        Area2 = width2 * height2
+        ratio = Area * 1. / (Area1 + Area2 - Area)
+    # return IOU
+    return ratio  # Reframe,GTframe
+
+
+def bb_intersection_over_union(box_a, box_b):
+    # adopted from https://www.pyimagesearch.com/2016/11/07/intersection-over-union-iou-for-object-detection/
+
+    # determine the (x, y)-coordinates of the intersection rectangle
+    xA = max(box_a[0], box_b[0])
+    yA = max(box_a[1], box_b[1])
+    xB = min(box_a[2], box_b[2])
+    yB = min(box_a[3], box_b[3])
+
+    # compute the area of intersection rectangle
+    inter_area = (xB - xA + 1) * (yB - yA + 1)
+
+    # compute the area of both the prediction and ground-truth
+    # rectangles
+    box_a_area = (box_a[2] - box_a[0] + 1) * (box_a[3] - box_a[1] + 1)
+    box_b_area = (box_b[2] - box_b[0] + 1) * (box_b[3] - box_b[1] + 1)
+
+    # compute the intersection over union by taking the intersection
+    # area and dividing it by the sum of prediction + ground-truth
+    # areas - the intersection area
+    if (xB - xA + 1) <= 0 or (yB - yA + 1) <= 0:
+        iou = 0
+    else:
+        iou = inter_area / float(box_a_area + box_b_area - inter_area)
+
+    # return the intersection over union value
+    return iou
+
+
+def box_iou(box1, box2):
+    '''Compute the intersection over union of two set of boxes.
+    TD: modified to be legacy compatible
+    The box order must be (xmin, ymin, xmax, ymax).
+    Args:
+      box1: (tensor) bounding boxes, sized [N,4].
+      box2: (tensor) bounding boxes, sized [M,4].
+    Return:
+      (tensor) iou, sized [N,M].
+    Reference:
+      https://github.com/chainer/chainercv/blob/master/chainercv/utils/bbox/bbox_iou.py
+    '''
+    N = box1.shape[0]
+    M = box2.shape[0]
+
+    lt = np.maximum(box1[:,None,:2], box2[:,:2])  # [N,M,2]
+    rb = np.minimum(box1[:,None,2:], box2[:,2:])  # [N,M,2]
+
+    wh = np.clip((rb-lt+1.), 0, None)      # [N,M,2]
+    inter = wh[:,:,0] * wh[:,:,1]  # [N,M]
+
+    area1 = (box1[:,2]-box1[:,0]+1.) * (box1[:,3]-box1[:,1]+1.)  # [N,]
+    area2 = (box2[:,2]-box2[:,0]+1.) * (box2[:,3]-box2[:,1]+1.)  # [M,]
+    iou = inter / (area1[:,None] + area2 - inter)
+    return iou
+
+
+def box_iou_org(box1, box2):
+    '''Compute the intersection over union of two set of boxes.
+    The box order must be (xmin, ymin, xmax, ymax).
+    Args:
+      box1: (tensor) bounding boxes, sized [N,4].
+      box2: (tensor) bounding boxes, sized [M,4].
+    Return:
+      (tensor) iou, sized [N,M].
+    Reference:
+      https://github.com/chainer/chainercv/blob/master/chainercv/utils/bbox/bbox_iou.py
+    '''
+    N = box1.shape[0]
+    M = box2.shape[0]
+
+    lt = np.maximum(box1[:,None,:2], box2[:,:2])  # [N,M,2]
+    rb = np.minimum(box1[:,None,2:], box2[:,2:])  # [N,M,2]
+
+    wh = np.clip((rb-lt), 0, None)      # [N,M,2]
+    inter = wh[:,:,0] * wh[:,:,1]  # [N,M]
+
+    area1 = (box1[:,2]-box1[:,0]) * (box1[:,3]-box1[:,1])  # [N,]
+    area2 = (box2[:,2]-box2[:,0]) * (box2[:,3]-box2[:,1])  # [M,]
+    iou = inter / (area1[:,None] + area2 - inter)
+    return iou
+
+
@@ -0,0 +1,38 @@
+# --------------------------------------------------------
+# Fast R-CNN
+# Copyright (c) 2015 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# Written by Ross Girshick
+# --------------------------------------------------------
+
+import numpy as np
+
+
+def nms(dets, scores, threshold=0.5):
+    x1 = dets[:, 0]
+    y1 = dets[:, 1]
+    x2 = dets[:, 2]
+    y2 = dets[:, 3]
+    # scores = dets[:, 4]
+
+    areas = (x2 - x1 + 1) * (y2 - y1 + 1)
+    order = scores.argsort()[::-1]
+
+    keep = []
+    while order.size > 0:
+        i = order[0]
+        keep.append(i)
+        xx1 = np.maximum(x1[i], x1[order[1:]])
+        yy1 = np.maximum(y1[i], y1[order[1:]])
+        xx2 = np.minimum(x2[i], x2[order[1:]])
+        yy2 = np.minimum(y2[i], y2[order[1:]])
+
+        w = np.maximum(0.0, xx2 - xx1 + 1)
+        h = np.maximum(0.0, yy2 - yy1 + 1)
+        inter = w * h
+        ovr = inter / (areas[i] + areas[order[1:]] - inter)
+
+        inds = np.where(ovr <= threshold)[0]
+        order = order[inds + 1]
+
+    return keep
@@ -0,0 +1,44 @@
+import os
+
+
+# file names, folders, paths
+
+def make_folder(res_path):
+    # create folder, if it does not exist
+    if not os.path.exists(res_path):
+        os.makedirs(res_path)
+
+
+def prepare_data_gen_folder(relative_path, sign_model_version, collection_name, res_folder_name='results'):
+    # create path to file that stores generated training data
+    res_path = '{}pytorch/{}/{}'.format(relative_path, res_folder_name, sign_model_version)
+    train_data_ext_file = '{}/line_generated_bboxes_{}.csv'.format(res_path, collection_name)
+    collection_subfolder = '{}/images/'.format(collection_name)
+    # create folder, if necessary
+    make_folder(res_path)
+    # remove generated file, if it exists
+    if os.path.isfile(train_data_ext_file):
+        os.remove(train_data_ext_file)
+
+    return train_data_ext_file, collection_subfolder, res_path
+
+
+def prepare_data_gen_folder_slim(collection_name, res_path_base):
+    # create path to file that stores generated training data
+
+    train_data_ext_file = '{}/line_generated_bboxes_{}.csv'.format(res_path_base, collection_name)
+    collection_subfolder = '{}/images/'.format(collection_name)
+    # create folder, if necessary
+    make_folder(res_path_base)
+    # remove generated file, if it exists
+    if os.path.isfile(train_data_ext_file):
+        os.remove(train_data_ext_file)
+
+    return train_data_ext_file, collection_subfolder
+
+
+def clean_cdli(cdli_str):
+    # remove Vs, Rs
+    out_str = cdli_str.replace("Vs", "")
+    out_str = out_str.replace("Rs", "")
+    return out_str
@@ -0,0 +1,384 @@
+import time
+from collections import OrderedDict
+import copy
+
+from tqdm import tqdm
+import numpy as np
+
+import torch
+import torch.nn as nn
+from torch.autograd import Variable
+from torch.nn.modules.module import _addindent
+
+import torchvision
+from torchvision.transforms import *
+
+import matplotlib.pyplot as plt
+
+
+# HELPER FUNCTIONS
+
+
+def weights_init(m):
+    if isinstance(m, nn.Linear):
+        m.weight.data.normal_(0, 0.01)  # 0.005
+        m.bias.data.zero_()
+
+
+def torch_summarize(model, show_weights=True, show_parameters=True):
+    # code found here: https://stackoverflow.com/questions/42480111/model-summary-in-pytorch
+    """Summarizes torch model by showing trainable parameters and weights."""
+    tmpstr = model.__class__.__name__ + ' (\n'
+    for key, module in model._modules.items():
+        # if it contains layers let call it recursively to get params and weights
+        if type(module) in [
+            torch.nn.modules.container.Container,
+            torch.nn.modules.container.Sequential
+        ]:
+            modstr = torch_summarize(module)
+        else:
+            modstr = module.__repr__()
+        modstr = _addindent(modstr, 2)
+
+        params = sum([np.prod(p.size()) for p in module.parameters()])
+        weights = tuple([tuple(p.size()) for p in module.parameters()])
+
+        tmpstr += '  (' + key + '): ' + modstr
+        if show_weights:
+            tmpstr += ', weights={}'.format(weights)
+        if show_parameters:
+            tmpstr +=  ', parameters={}'.format(params)
+        tmpstr += '\n'
+
+    tmpstr = tmpstr + ')'
+    return tmpstr
+
+
+def summary(mymodule, input_size):
+    # code from PR by isaykatsman https://github.com/pytorch/pytorch/pull/3043
+    def register_hook(module):
+        def hook(module, input, output):
+            if module._modules:  # only want base layers
+                return
+            class_name = str(module.__class__).split('.')[-1].split("'")[0]
+            module_idx = len(summary)
+            m_key = '%s-%i' % (class_name, module_idx + 1)
+            summary[m_key] = OrderedDict()
+            summary[m_key]['input_shape'] = list(input[0].size())
+            summary[m_key]['input_shape'][0] = None
+            if output.__class__.__name__ == 'tuple':
+                summary[m_key]['output_shape'] = list(output[0].size())
+            else:
+                summary[m_key]['output_shape'] = list(output.size())
+            summary[m_key]['output_shape'][0] = None
+
+            params = 0
+            # iterate through parameters and count num params
+            for name, p in module._parameters.items():
+                params += torch.numel(p.data)
+                summary[m_key]['trainable'] = p.requires_grad
+
+            summary[m_key]['nb_params'] = params
+
+        if not isinstance(module, torch.nn.Sequential) and \
+           not isinstance(module, torch.nn.ModuleList) and \
+           not (module == mymodule):
+            hooks.append(module.register_forward_hook(hook))
+
+    # check if there are multiple inputs to the network
+    if isinstance(input_size[0], (list, tuple)):
+        x = [Variable(torch.rand(1, *in_size)) for in_size in input_size]
+    else:
+        x = Variable(torch.randn(1, *input_size))
+
+    # create properties
+    summary = OrderedDict()
+    hooks = []
+    # register hook
+    mymodule.apply(register_hook)
+    # make a forward pass
+    mymodule(x)
+    # remove these hooks
+    for h in hooks:
+        h.remove()
+
+    # print out neatly
+    def get_names(module, name, acc):
+        if not module._modules:
+            acc.append(name)
+        else:
+            for key in module._modules.keys():
+                p_name = key if name == "" else name + "." + key
+                get_names(module._modules[key], p_name, acc)
+    names = []
+    get_names(mymodule, "", names)
+
+    col_width = 25  # should be >= 12
+    summary_width = 61
+
+    def crop(s):
+        return s[:col_width] if len(s) > col_width else s
+
+    print('_' * summary_width)
+    print('{0: <{3}} {1: <{3}} {2: <{3}}'.format(
+        'Layer (type)', 'Output Shape', 'Param #', col_width))
+    print('=' * summary_width)
+    total_params = 0
+    trainable_params = 0
+    for (i, l_type), l_name in zip(enumerate(summary), names):
+        d = summary[l_type]
+        total_params += d['nb_params']
+        if 'trainable' in d and d['trainable']:
+            trainable_params += d['nb_params']
+        print('{0: <{3}} {1: <{3}} {2: <{3}}'.format(
+            crop(l_name + ' (' + l_type[:-2] + ')'), crop(str(d['output_shape'])),
+            crop(str(d['nb_params'])), col_width))
+        if i < len(summary) - 1:
+            print('_' * summary_width)
+    print('=' * summary_width)
+    print('Total params: ' + str(total_params))
+    print('Trainable params: ' + str(trainable_params))
+    print('Non-trainable params: ' + str((total_params - trainable_params)))
+    print('_' * summary_width)
+
+
+def visualize_model(model, dataloader, re_transform, device, num_images=6):
+    was_training = model.training
+    images_so_far = 0
+    fig = plt.figure(figsize=(10, 10))
+
+    # switch to bachnorm and dropout to eval mode
+    model.eval()
+
+    with torch.no_grad():
+        for i, (inputs, labels) in enumerate(dataloader):
+            inputs = inputs.to(device)
+            labels = labels.to(device)
+
+            # compute predictions using the model
+            outputs = model(inputs)
+            _, preds = torch.max(outputs, 1)
+
+            for j in range(inputs.size()[0]):
+                images_so_far += 1
+                ax = plt.subplot(num_images // 2, 2, images_so_far)
+                ax.axis('off')
+                ax.set_title('predicted: {}'.format(preds[j]))
+
+                ax.imshow(re_transform(inputs.cpu().data[j].clone()), cmap=plt.cm.Greys_r)
+
+                if images_so_far == num_images:
+                    model.train(mode=was_training)
+                    return
+        model.train(mode=was_training)
+
+
+def prepare_embedding(model_feature, dataloader, re_transform, device):
+
+    # switch to bachnorm and dropout to eval mode
+    model_feature.eval()
+
+    f_list = []
+    i_list = []
+    l_list = []
+
+    with torch.no_grad():
+        # inputs, labels = next(iter(dataloaders['train']))
+        for inputs, labels in dataloader:
+            em_sz = inputs.shape[0]
+
+            # append labels
+            l_list.append(labels)
+
+            # undo transform, convert to RGB, and convert back to tensor
+            t_list = []
+            for t in inputs:
+                t_list.append(torchvision.transforms.ToTensor()(re_transform(t.clone()).convert('RGB')))
+
+                # append images
+            i_list.append(torch.stack(t_list))
+
+            # compute feature
+            inputs = inputs.to(device)
+
+            # append features
+            f_list.append(model_feature(inputs).view(em_sz, -1).data)
+
+    return torch.cat(f_list), torch.cat(l_list).numpy(), torch.cat(i_list)
+
+
+def prepare_prcurves(model, dataloader, device):
+    # create softmax
+    softmax = nn.Softmax()
+    # loop over dataset with dataloader
+    p_list = []
+    l_list = []
+    with torch.no_grad():
+        # inputs, labels = next(iter(dataloaders['train']))
+        for inputs, labels in dataloader:
+            # append labels
+            l_list.append(labels)
+            # prepare input
+            inputs = inputs.to(device)
+            # apply network model
+            output = model(inputs)
+            # compute softmax
+            predicted = softmax(output)
+            # append features
+            p_list.append(predicted.data.cpu())
+
+    # concat to tensors
+    return torch.cat(p_list), torch.cat(l_list)
+
+
+def preprocess_tablet_im(pil_im, scale, shift=5.0):
+    # compute scaled size
+    imw, imh = pil_im.size
+    imw = int(imw * scale)
+    imh = int(imh * scale)
+    # determine crop size
+    crop_sz = [int(imh - shift), int(imw - shift)]
+    # tensor-space transforms
+    ts_transform = torchvision.transforms.Compose([
+        torchvision.transforms.ToTensor(),
+        torchvision.transforms.Normalize(mean=[0.5], std=[1]),  # normalize
+    ])
+    # compose transforms
+    tablet_transform = torchvision.transforms.Compose([
+            torchvision.transforms.Lambda(lambda x: x.convert('L')),  # convert to gray
+            Resize((imh, imw)),  # resize according to scale
+            FiveCrop((crop_sz[0], crop_sz[1])),  # oversample
+            torchvision.transforms.Lambda(
+                lambda crops: torch.stack([ts_transform(crop) for crop in crops])),  # returns a 4D tensor
+        ])
+    # apply transforms
+    im_list = tablet_transform(pil_im)
+    return im_list
+
+
+def predict(model, im_list, device, use_bbox_reg=False):
+    inputs = im_list
+
+    with torch.no_grad():  # faster, less memory usage
+        # prepare input
+        inputs = inputs.to(device)
+
+        # apply network model
+        # output = model(inputs) # consumes to much memory
+        output = []
+        for in_im in inputs:
+            output.append(model(in_im.unsqueeze(0)))
+        output = torch.cat(output, dim=0)
+
+        # convert to numpy
+        predicted = output.data.cpu().numpy()
+        # free memory?!
+        # del output
+
+    # TODO: integrate bbox regression
+    result_roi = []
+    # stack detections to single tensor
+    predicted_roi = []
+    if use_bbox_reg:
+        predicted_roi = np.stack(result_roi).squeeze()
+
+    return predicted, predicted_roi
+
+
+# TRAINER HELPER
+
+
+def get_tensorboard_writer(logs_folder='runs_new', comment=''):
+    # init logger
+    import os
+    import socket
+    from datetime import datetime
+    from tensorboardX import SummaryWriter
+
+    current_time = datetime.now().strftime('%b%d_%H-%M-%S')
+    log_dir = os.path.join(logs_folder, current_time + '_' + socket.gethostname() + comment)
+    writer = SummaryWriter(log_dir=log_dir)  # comment='_{}'.format(weights_path.split('/')[1].split('.')[0])
+    return writer
+
+
+# TRAINER FUNCTIONS
+
+def train_model(model, criterion, optimizer, scheduler, writer, dataloaders, dataset_sizes, device, num_epochs=25, test_every=10):
+    ''' generic trainer function '''
+    since = time.time()
+
+    best_model_wts = copy.deepcopy(model.state_dict())
+    best_acc = 0.0
+    best_epoch = 0
+
+    for epoch in tqdm(range(num_epochs)):
+        print('Epoch {}/{}'.format(epoch, num_epochs - 1))
+        print('-' * 10)
+
+        # Each epoch has a training and validation phase
+
+        phases = ['train', 'dev']
+        if epoch % test_every != 0:
+            phases = ['train']
+
+        for phase in phases:
+            if phase == 'train':
+                scheduler.step()
+                model.train()  # Set model to training mode
+            else:
+                model.eval()  # Set model to evaluate mode
+
+            running_loss = 0.0
+            running_corrects = 0
+
+            # Iterate over data.
+            for inputs, labels in dataloaders[phase]:
+                inputs = inputs.to(device)
+                labels = labels.to(device)
+
+                # zero the parameter gradients
+                optimizer.zero_grad()
+
+                # forward
+                # track history if only in train
+                with torch.set_grad_enabled(phase == 'train'):
+                    outputs = model(inputs)
+                    _, preds = torch.max(outputs, 1)
+                    loss = criterion(outputs, labels)
+
+                    # backward + optimize only if in training phase
+                    if phase == 'train':
+                        loss.backward()
+                        optimizer.step()
+                        # else:
+                        # for name, param in model.named_parameters():
+                        #     writer.add_histogram(name, param.clone().cpu().data.numpy(), epoch)
+
+                # statistics
+                running_loss += loss.item()  # * inputs.size(0)  # uncomment this to fix a legacy bug XXX
+                running_corrects += torch.sum(preds == labels.data)
+
+            epoch_loss = running_loss / dataset_sizes[phase]
+            epoch_acc = running_corrects.double() / float(dataset_sizes[phase])
+
+            # write to logger
+            writer.add_scalar('data/{}/loss'.format(phase), epoch_loss, epoch)
+            writer.add_scalar('data/{}/acc'.format(phase), epoch_acc, epoch)
+
+            print('{} Loss: {:.4f} Acc: {:.4f}'.format(phase, epoch_loss, epoch_acc))
+            print('{} Number correct: {} '.format(phase, running_corrects))
+
+            # deep copy the model
+            if phase == 'dev' and epoch_acc > best_acc:
+                best_acc = epoch_acc
+                best_model_wts = copy.deepcopy(model.state_dict())
+                best_epoch = epoch
+
+    time_elapsed = time.time() - since
+    print('Training complete in {:.0f}m {:.0f}s'.format(
+        time_elapsed // 60, time_elapsed % 60))
+    print('Best val Acc: {:4f} at {}'.format(best_acc, best_epoch))
+
+    # load best model weights
+    model.load_state_dict(best_model_wts)
+    return model
@@ -0,0 +1,134 @@
+import torch
+
+
+def change_box_order(boxes, order):
+    '''Change box order between (xmin,ymin,xmax,ymax) and (xcenter,ycenter,width,height).
+
+    Args:
+      boxes: (tensor) bounding boxes, sized [N,4].
+      order: (str) either 'xyxy2xywh' or 'xywh2xyxy'.
+
+    Returns:
+      (tensor) converted bounding boxes, sized [N,4].
+    '''
+    assert order in ['xyxy2xywh','xywh2xyxy']
+    a = boxes[:,:2]
+    b = boxes[:,2:]
+    if order == 'xyxy2xywh':
+        return torch.cat([(a+b)/2,b-a], 1)
+    return torch.cat([a-b/2,a+b/2], 1)
+
+def box_clamp(boxes, xmin, ymin, xmax, ymax):
+    '''Clamp boxes.
+
+    Args:
+      boxes: (tensor) bounding boxes of (xmin,ymin,xmax,ymax), sized [N,4].
+      xmin: (number) min value of x.
+      ymin: (number) min value of y.
+      xmax: (number) max value of x.
+      ymax: (number) max value of y.
+
+    Returns:
+      (tensor) clamped boxes.
+    '''
+    boxes[:,0].clamp_(min=xmin, max=xmax)
+    boxes[:,1].clamp_(min=ymin, max=ymax)
+    boxes[:,2].clamp_(min=xmin, max=xmax)
+    boxes[:,3].clamp_(min=ymin, max=ymax)
+    return boxes
+
+def box_select(boxes, xmin, ymin, xmax, ymax):
+    '''Select boxes in range (xmin,ymin,xmax,ymax).
+
+    Args:
+      boxes: (tensor) bounding boxes of (xmin,ymin,xmax,ymax), sized [N,4].
+      xmin: (number) min value of x.
+      ymin: (number) min value of y.
+      xmax: (number) max value of x.
+      ymax: (number) max value of y.
+
+    Returns:
+      (tensor) selected boxes, sized [M,4].
+      (tensor) selected mask, sized [N,].
+    '''
+    mask = (boxes[:,0]>=xmin) & (boxes[:,1]>=ymin) \
+         & (boxes[:,2]<=xmax) & (boxes[:,3]<=ymax)
+    boxes = boxes[mask,:]
+    return boxes, mask
+
+def box_iou(box1, box2):
+    '''Compute the intersection over union of two set of boxes.
+
+    The box order must be (xmin, ymin, xmax, ymax).
+
+    Args:
+      box1: (tensor) bounding boxes, sized [N,4].
+      box2: (tensor) bounding boxes, sized [M,4].
+
+    Return:
+      (tensor) iou, sized [N,M].
+
+    Reference:
+      https://github.com/chainer/chainercv/blob/master/chainercv/utils/bbox/bbox_iou.py
+    '''
+    N = box1.size(0)
+    M = box2.size(0)
+
+    lt = torch.max(box1[:,None,:2], box2[:,:2])  # [N,M,2]
+    rb = torch.min(box1[:,None,2:], box2[:,2:])  # [N,M,2]
+
+    wh = (rb-lt).clamp(min=0)      # [N,M,2]
+    inter = wh[:,:,0] * wh[:,:,1]  # [N,M]
+
+    area1 = (box1[:,2]-box1[:,0]) * (box1[:,3]-box1[:,1])  # [N,]
+    area2 = (box2[:,2]-box2[:,0]) * (box2[:,3]-box2[:,1])  # [M,]
+    iou = inter / (area1[:,None] + area2 - inter)
+    return iou
+
+def box_nms(bboxes, scores, threshold=0.5):
+    '''Non maximum suppression.
+
+    Args:
+      bboxes: (tensor) bounding boxes, sized [N,4].
+      scores: (tensor) confidence scores, sized [N,].
+      threshold: (float) overlap threshold.
+
+    Returns:
+      keep: (tensor) selected indices.
+
+    Reference:
+      https://github.com/rbgirshick/py-faster-rcnn/blob/master/lib/nms/py_cpu_nms.py
+    '''
+    x1 = bboxes[:,0]
+    y1 = bboxes[:,1]
+    x2 = bboxes[:,2]
+    y2 = bboxes[:,3]
+
+    areas = (x2-x1 + 1) * (y2-y1 + 1)
+    _, order = scores.sort(0, descending=True)
+
+    keep = []
+    while order.numel() > 0:
+        if order.numel() == 1:
+            i = order.item()
+            keep.append(i)
+            break
+
+        i = order[0]
+        keep.append(i)
+
+        xx1 = x1[order[1:]].clamp(min=x1[i].item())
+        yy1 = y1[order[1:]].clamp(min=y1[i].item())
+        xx2 = x2[order[1:]].clamp(max=x2[i].item())
+        yy2 = y2[order[1:]].clamp(max=y2[i].item())
+
+        w = (xx2-xx1 + 1).clamp(min=0)
+        h = (yy2-yy1 + 1).clamp(min=0)
+        inter = w * h
+
+        overlap = inter / (areas[i] + areas[order[1:]] - inter)
+        ids = (overlap <= threshold).nonzero().squeeze()
+        if ids.numel() == 0:
+            break
+        order = order[ids+1]
+    return torch.tensor(keep, dtype=torch.long)
@@ -0,0 +1,246 @@
+'''Encode object boxes and labels.'''
+import math
+import torch
+import itertools
+import time
+import numpy as np
+
+from .meshgrid import meshgrid
+from .box import box_iou, box_nms, change_box_order
+
+
+class FPNSSDBoxCoder:
+    def __init__(self, input_size=[512., 512.], with_64=False, create_bg_class=True, with_4_aspects=False, with_4_scales=False):
+        self.num_anchors = 12  # 12  # 9
+        # self.anchor_areas = (32 * 32., 64 * 64., 128 * 128., 256 * 256., 341 * 341., 426 * 426., 512 * 512.)
+        # self.aspect_ratios = (1 / 2., 1 / 1., 2 / 1.)
+        # self.scale_ratios = (1., pow(2, 1 / 3.), pow(2, 2 / 3.))
+
+        # compute num boxes for 500x500 patch
+        # 500/16(stride) -> 32
+        # 500/32(stride) -> 16
+        # 500/64(stride) -> 8
+        # (16^2 + 8^2) * num_anchors -> for 12: 3840
+        # (32^2 + 16^2 + 8^2) * num_anchors -> for 12: 16128
+
+        self.with_64 = with_64
+        if self.with_64:
+            self.anchor_areas = [64 * 64., 128 * 128., 256 * 256.]
+        else:
+            self.anchor_areas = [128 * 128., 256 * 256.]
+        if with_4_aspects:
+            self.aspect_ratios = [3 / 5., 1 / 1., 2 / 1., 3 / 1.]
+        else:
+            self.aspect_ratios = [2 / 1., 1 / 1., 2 / 1., 3 / 1.]  # [1 / 0.5, 1 / 1., 2 / 1., 3 / 1.]
+        if with_4_scales:
+            assert with_4_scales != with_4_aspects, "Cannot use with_4_scales and with_4_aspects simultaneously!"
+            self.scale_ratios = [0.8, 1., pow(2, 1 / 3.), pow(2, 2 / 3.)]
+            self.aspect_ratios = [1 / 1., 2 / 1., 3 / 1.]
+        else:
+            self.scale_ratios = [1., pow(2, 1 / 3.), pow(2, 2 / 3.)]
+
+        self.input_size = torch.tensor(input_size).float()
+        self.anchor_boxes = self._get_anchor_boxes(input_size=self.input_size)
+
+        self.create_bg_class = create_bg_class
+
+    def _get_anchor_wh(self):
+        '''Compute anchor width and height for each feature map.
+
+        Returns:
+          anchor_wh: (tensor) anchor wh, sized [#fm, #anchors_per_cell, 2].
+        '''
+        anchor_wh = []
+        for s in self.anchor_areas:
+            for ar in self.aspect_ratios:  # w/h = ar
+                h = math.sqrt(s / ar)
+                w = ar * h
+                for sr in self.scale_ratios:  # scale
+                    anchor_h = h * sr
+                    anchor_w = w * sr
+                    anchor_wh.append([anchor_w, anchor_h])
+        num_fms = len(self.anchor_areas)
+        return torch.tensor(anchor_wh).view(num_fms, -1, 2)
+
+    def _get_anchor_boxes(self, input_size):
+        '''Compute anchor boxes for each feature map.
+
+        Args:
+          input_size: (tensor) model input size of (w,h).
+
+        Returns:
+          anchor_boxes: (tensor) anchor boxes for each feature map. Each of size [#anchors,4],
+            where #anchors = fmw * fmh * #anchors_per_cell
+        '''
+        num_fms = len(self.anchor_areas)
+        anchor_wh = self._get_anchor_wh()
+        # fm_sizes = [(input_size / pow(2., i + 3)).ceil() for i in range(num_fms)]  # p3 -> p7 feature map sizes
+        if self.with_64:   # num_fms == 3:
+            fm_sizes = [(input_size / pow(2., i + 4)).ceil() for i in range(num_fms)]  # p4 -> p6 feature map sizes
+        else:  # num_fms == 2:
+            fm_sizes = [(input_size / pow(2., i + 5)).ceil() for i in range(num_fms)]  # p5 -> p6 feature map sizes
+
+        boxes = []
+        for i in range(num_fms):
+            fm_size = fm_sizes[i]
+            grid_size = input_size / fm_size
+            fm_w, fm_h = int(fm_size[0]), int(fm_size[1])
+            xy = meshgrid(fm_w, fm_h) + 0.5  # [fm_h*fm_w, 2]
+            xy = (xy * grid_size).view(fm_h, fm_w, 1, 2).expand(fm_h, fm_w, self.num_anchors, 2)
+            wh = anchor_wh[i].view(1, 1, self.num_anchors, 2).expand(fm_h, fm_w, self.num_anchors, 2)
+            box = torch.cat([xy - wh / 2., xy + wh / 2.], 3)  # [x,y,x,y]
+            boxes.append(box.view(-1, 4))
+        return torch.cat(boxes, 0)
+
+    def encode(self, boxes, labels):
+        '''Encode target bounding boxes and class labels.
+
+        SSD coding rules:
+          tx = (x - anchor_x) / (variance[0]*anchor_w)
+          ty = (y - anchor_y) / (variance[0]*anchor_h)
+          tw = log(w / anchor_w)
+          th = log(h / anchor_h)
+
+        Args:
+          boxes: (tensor) bounding boxes of (xmin,ymin,xmax,ymax), sized [#obj,4].
+          labels: (tensor) object class labels, sized [#obj,].
+
+        Returns:
+          loc_targets: (tensor) encoded bounding boxes, sized [#anchors,4].
+          cls_targets: (tensor) encoded class labels, sized [#anchors,].
+
+        Reference:
+          https://github.com/chainer/chainercv/blob/master/chainercv/links/model/ssd/multibox_coder.py
+        '''
+
+        def argmax(x):
+            '''Find the max value index(row & col) of a 2D tensor.'''
+            v, i = x.max(0)
+            j = v.max(0)[1].item()
+            return (i[j], j)
+
+        # before_ts = time.time()
+
+        anchor_boxes = self.anchor_boxes
+        ious = box_iou(anchor_boxes, boxes)  # [#anchors, #obj]
+        index = torch.empty(anchor_boxes.size(0), dtype=torch.long).fill_(-1)  # TD: for every anchorbox
+        masked_ious = ious.clone()
+
+        # TD: this whole while loop seems unnecessary... maybe performance issue?!
+        while True:
+            # TD: this should be run for every gt box with fitting anchor
+            i, j = argmax(masked_ious)
+            if masked_ious[i, j] < 1e-6:
+                break
+            index[i] = j
+            # TD: zero row and column
+            masked_ious[i, :] = 0
+            masked_ious[:, j] = 0
+
+        # TD: deal with anchor boxes that have not been assigned yet
+        mask = (index < 0) & (ious.max(1)[0] >= 0.5)
+        if mask.any():
+            index[mask] = ious[mask].max(1)[1]   # TD: assign if iou more than 0.5
+        # TD: does this clamp remove index -1 otherwise boxes[0] selected very often?!
+        boxes = boxes[index.clamp(min=0)]  # negative index not supported
+        boxes = change_box_order(boxes, 'xyxy2xywh')
+        anchor_boxes = change_box_order(anchor_boxes, 'xyxy2xywh')
+
+        loc_xy = (boxes[:, :2] - anchor_boxes[:, :2]) / anchor_boxes[:, 2:]
+        loc_wh = torch.log(boxes[:, 2:] / anchor_boxes[:, 2:])
+        loc_targets = torch.cat([loc_xy, loc_wh], 1)
+
+        if self.create_bg_class:
+            # TD: does this clamp remove index -1 otherwise labels[0] selected very often?!
+            cls_targets = 1 + labels[index.clamp(min=0)]
+        else:
+            # if background class 0 already exists in labels
+            cls_targets = labels[index.clamp(min=0)]
+
+        # ok here index -1 targets are set to zero anyways
+        cls_targets[index < 0] = 0
+
+        # print('time spent encoding: {}'.format(time.time() - before_ts))
+
+        return loc_targets, cls_targets
+
+    def decode(self, loc_preds, cls_preds, score_thresh=0.6, nms_thresh=0.45):
+        '''Decode predicted loc/cls back to real box locations and class labels.
+
+        Args:
+          loc_preds: (tensor) predicted loc, sized [#anchors,4].
+          cls_preds: (tensor) predicted conf, sized [#anchors,#classes].
+          score_thresh: (float) threshold for object confidence score.
+          nms_thresh: (float) threshold for box nms.
+
+        Returns:
+          boxes: (tensor) bbox locations, sized [#obj,4].
+          labels: (tensor) class labels, sized [#obj,].
+        '''
+        anchor_boxes = change_box_order(self.anchor_boxes, 'xyxy2xywh')
+        xy = loc_preds[:, :2] * anchor_boxes[:, 2:] + anchor_boxes[:, :2]
+        wh = loc_preds[:, 2:].exp() * anchor_boxes[:, 2:]
+        box_preds = torch.cat([xy - wh / 2, xy + wh / 2], 1)
+
+        boxes = []
+        labels = []
+        scores = []
+        num_classes = cls_preds.size(1)
+        if self.create_bg_class:
+            for i in range(num_classes - 1):
+                score = cls_preds[:, i + 1]  # class i corresponds to (i+1) column
+                mask = score > score_thresh
+                if not mask.any():
+                    continue
+                box = box_preds[mask]
+                score = score[mask]
+                # print(box.size())
+                # print(score.size())
+
+                keep = box_nms(box, score, nms_thresh)
+                boxes.append(box[keep])
+                labels.append(torch.empty_like(keep).fill_(i))
+                scores.append(score[keep])
+        else:
+            for i in range(1, num_classes):
+                score = cls_preds[:, i]  # class i corresponds to (i+1) column
+                mask = score > score_thresh
+                if not mask.any():
+                    continue
+                box = box_preds[mask]
+                score = score[mask]
+                # print(box.size())
+                # print(score.size())
+
+                keep = box_nms(box, score, nms_thresh)
+                boxes.append(box[keep])
+                labels.append(torch.empty_like(keep).fill_(i))
+                scores.append(score[keep])
+
+        # concatenate if not empty
+        if len(boxes) > 0:
+            boxes = torch.cat(boxes, 0)
+            labels = torch.cat(labels, 0)
+            scores = torch.cat(scores, 0)
+        return boxes, labels, scores
+
+    def decode_boxes(self, loc_preds):
+
+        anchor_boxes = change_box_order(self.anchor_boxes, 'xyxy2xywh')
+        xy = loc_preds[:, :2] * anchor_boxes[:, 2:] + anchor_boxes[:, :2]
+        wh = loc_preds[:, 2:].exp() * anchor_boxes[:, 2:]
+        box_preds = torch.cat([xy - wh / 2, xy + wh / 2], 1)
+
+        boxes = box_preds
+        return boxes
+
+
+def test():
+    box_coder = FPNSSDBoxCoder()
+    print(box_coder.anchor_boxes.size())
+    boxes = torch.tensor([[0, 0, 100, 100], [100, 100, 200, 200]], dtype=torch.float)
+    labels = torch.tensor([0, 1], dtype=torch.long)
+    loc_targets, cls_targets = box_coder.encode(boxes, labels)
+    print(loc_targets.size(), cls_targets.size())
+
+# test()
@@ -0,0 +1,169 @@
+'''Encode object boxes and labels.'''
+import math
+import torch
+import numpy as np
+
+from .meshgrid import meshgrid
+from .box import box_iou, box_nms, change_box_order
+
+
+class RetinaBoxCoder:
+    def __init__(self, input_size=[512., 512.], with_64=False, create_bg_class=True, with_4_aspects=False, with_4_scales=False):
+        self.num_anchors = 12
+        # self.anchor_areas = (32*32., 64*64., 128*128., 256*256., 512*512.)  # p3 -> p7
+        # self.aspect_ratios = (1/2., 1/1., 2/1.)
+        # self.scale_ratios = (1., pow(2,1/3.), pow(2,2/3.))
+        self.with_64 = with_64
+        if self.with_64:
+            self.anchor_areas = [64 * 64., 128 * 128., 256 * 256.]
+        else:
+            self.anchor_areas = [128 * 128., 256 * 256.]
+        if with_4_aspects:
+            self.aspect_ratios = [3 / 5., 1 / 1., 2 / 1., 3 / 1.]
+        else:
+            self.aspect_ratios = [2 / 1., 1 / 1., 2 / 1., 3 / 1.]  # [1 / 0.5, 1 / 1., 2 / 1., 3 / 1.]
+        if with_4_scales:
+            assert with_4_scales != with_4_aspects, "Cannot use with_4_scales and with_4_aspects simultaneously!"
+            self.scale_ratios = [0.8, 1., pow(2, 1 / 3.), pow(2, 2 / 3.)]
+            self.aspect_ratios = [1 / 1., 2 / 1., 3 / 1.]
+        else:
+            self.scale_ratios = [1., pow(2, 1 / 3.), pow(2, 2 / 3.)]
+
+        self.input_size = torch.tensor(input_size).float()
+        self.anchor_boxes = self._get_anchor_boxes(input_size=self.input_size)
+
+        self.create_bg_class = create_bg_class
+
+    def _get_anchor_wh(self):
+        '''Compute anchor width and height for each feature map.
+
+        Returns:
+          anchor_wh: (tensor) anchor wh, sized [#fm, #anchors_per_cell, 2].
+        '''
+        anchor_wh = []
+        for s in self.anchor_areas:
+            for ar in self.aspect_ratios:  # w/h = ar
+                h = math.sqrt(s / ar)
+                w = ar * h
+                for sr in self.scale_ratios:  # scale
+                    anchor_h = h * sr
+                    anchor_w = w * sr
+                    anchor_wh.append([anchor_w, anchor_h])
+        num_fms = len(self.anchor_areas)
+        return torch.Tensor(anchor_wh).view(num_fms, -1, 2)
+
+    def _get_anchor_boxes(self, input_size):
+        '''Compute anchor boxes for each feature map.
+
+        Args:
+          input_size: (tensor) model input size of (w,h).
+
+        Returns:
+          boxes: (list) anchor boxes for each feature map. Each of size [#anchors,4],
+                        where #anchors = fmw * fmh * #anchors_per_cell
+        '''
+        num_fms = len(self.anchor_areas)
+        anchor_wh = self._get_anchor_wh()
+        # fm_sizes = [(input_size / pow(2., i + 3)).ceil() for i in range(num_fms)]  # p3 -> p7 feature map sizes
+        if self.with_64:   # num_fms == 3:
+            fm_sizes = [(input_size / pow(2., i + 4)).ceil() for i in range(num_fms)]  # p4 -> p6 feature map sizes
+        else:  # num_fms == 2:
+            fm_sizes = [(input_size / pow(2., i + 5)).ceil() for i in range(num_fms)]  # p5 -> p6 feature map sizes
+
+        boxes = []
+        for i in range(num_fms):
+            fm_size = fm_sizes[i]
+            grid_size = input_size / fm_size
+            fm_w, fm_h = int(fm_size[0]), int(fm_size[1])
+            xy = meshgrid(fm_w, fm_h) + 0.5  # [fm_h*fm_w, 2]
+            xy = (xy * grid_size).view(fm_h, fm_w, 1, 2).expand(fm_h, fm_w, self.num_anchors, 2)
+            wh = anchor_wh[i].view(1, 1, self.num_anchors, 2).expand(fm_h, fm_w, self.num_anchors, 2)
+            box = torch.cat([xy - wh / 2., xy + wh / 2.], 3)  # [x,y,x,y]
+            boxes.append(box.view(-1, 4))
+        return torch.cat(boxes, 0)
+
+    def encode(self, boxes, labels):
+        '''Encode target bounding boxes and class labels.
+
+        We obey the Faster RCNN box coder:
+          tx = (x - anchor_x) / anchor_w
+          ty = (y - anchor_y) / anchor_h
+          tw = log(w / anchor_w)
+          th = log(h / anchor_h)
+
+        Args:
+          boxes: (tensor) bounding boxes of (xmin,ymin,xmax,ymax), sized [#obj, 4].
+          labels: (tensor) object class labels, sized [#obj,].
+
+        Returns:
+          loc_targets: (tensor) encoded bounding boxes, sized [#anchors,4].
+          cls_targets: (tensor) encoded class labels, sized [#anchors,].
+        '''
+        anchor_boxes = self.anchor_boxes
+        ious = box_iou(anchor_boxes, boxes)
+        max_ious, max_ids = ious.max(1)
+        boxes = boxes[max_ids]
+
+        boxes = change_box_order(boxes, 'xyxy2xywh')
+        anchor_boxes = change_box_order(anchor_boxes, 'xyxy2xywh')
+
+        loc_xy = (boxes[:, :2] - anchor_boxes[:, :2]) / anchor_boxes[:, 2:]
+        loc_wh = torch.log(boxes[:, 2:] / anchor_boxes[:, 2:])
+        loc_targets = torch.cat([loc_xy, loc_wh], 1)
+
+        if self.create_bg_class:
+            cls_targets = 1 + labels[max_ids]
+        else:
+            # if background class 0 already exists in labels
+            cls_targets = labels[max_ids]
+
+        cls_targets[max_ious < 0.5] = 0  # WATCH OUT HERE, this is just for testing!!
+        # ignore = (max_ious > 0.4) & (max_ious < 0.5)  # ignore ious between [0.4,0.5]
+        # cls_targets[ignore] = -1  # mark ignored to -1
+        return loc_targets, cls_targets
+
+    def decode(self, loc_preds, cls_preds, input_size, score_thresh=0.5, nms_thresh=0.5):
+        '''Decode outputs back to bouding box locations and class labels.
+
+        Args:
+          loc_preds: (tensor) predicted locations, sized [#anchors, 4].
+          cls_preds: (tensor) predicted class labels, sized [#anchors, #classes].
+          input_size: (tuple) model input size of (w,h).
+
+        Returns:
+          boxes: (tensor) decode box locations, sized [#obj,4].
+          labels: (tensor) class labels for each box, sized [#obj,].
+        '''
+        CLS_THRESH = score_thresh
+        NMS_THRESH = nms_thresh
+
+        input_size = torch.Tensor(input_size)
+        # anchor_boxes = self._get_anchor_boxes(input_size)  # xywh
+        anchor_boxes = change_box_order(self._get_anchor_boxes(input_size), 'xyxy2xywh')
+
+        loc_xy = loc_preds[:, :2]
+        loc_wh = loc_preds[:, 2:]
+
+        xy = loc_xy * anchor_boxes[:, 2:] + anchor_boxes[:, :2]
+        wh = loc_wh.exp() * anchor_boxes[:, 2:]
+        boxes = torch.cat([xy - wh / 2, xy + wh / 2], 1)  # [#anchors,4]
+
+        score, labels = cls_preds.sigmoid().max(1)  # [#anchors,]
+        ids = score > CLS_THRESH
+        ids = ids.nonzero().squeeze()  # [#obj,]
+        keep = box_nms(boxes[ids], score[ids], threshold=NMS_THRESH)
+        return boxes[ids][keep], labels[ids][keep]  # , score[ids][keep]
+
+    def decode_boxes(self, loc_preds):
+
+        anchor_boxes = change_box_order(self.anchor_boxes, 'xyxy2xywh')
+
+        loc_xy = loc_preds[:, :2]
+        loc_wh = loc_preds[:, 2:]
+
+        xy = loc_xy * anchor_boxes[:, 2:] + anchor_boxes[:, :2]
+        wh = loc_wh.exp() * anchor_boxes[:, 2:]
+        box_preds = torch.cat([xy - wh / 2, xy + wh / 2], 1)
+
+        boxes = box_preds
+        return boxes
@@ -0,0 +1,178 @@
+'''Encode object boxes and labels.'''
+import math
+import torch
+import numpy as np
+
+from .meshgrid import meshgrid
+from .box import box_iou, box_nms, change_box_order
+
+
+class RetinaBoxCoder:
+    def __init__(self, input_size=[512., 512.], with_64=False, create_bg_class=True, with_4_aspects=False, with_4_scales=False):
+        self.num_anchors = 12
+        # self.anchor_areas = (32*32., 64*64., 128*128., 256*256., 512*512.)  # p3 -> p7
+        # self.aspect_ratios = (1/2., 1/1., 2/1.)
+        # self.scale_ratios = (1., pow(2,1/3.), pow(2,2/3.))
+        self.with_64 = with_64
+        if self.with_64:
+            self.anchor_areas = [64 * 64., 128 * 128., 256 * 256.]
+        else:
+            self.anchor_areas = [128 * 128., 256 * 256.]
+        if with_4_aspects:
+            self.aspect_ratios = [3 / 5., 1 / 1., 2 / 1., 3 / 1.]
+        else:
+            self.aspect_ratios = [2 / 1., 1 / 1., 2 / 1., 3 / 1.]  # [1 / 0.5, 1 / 1., 2 / 1., 3 / 1.]
+        if with_4_scales:
+            assert with_4_scales != with_4_aspects, "Cannot use with_4_scales and with_4_aspects simultaneously!"
+            self.scale_ratios = [0.8, 1., pow(2, 1 / 3.), pow(2, 2 / 3.)]
+            self.aspect_ratios = [1 / 1., 2 / 1., 3 / 1.]
+        else:
+            self.scale_ratios = [1., pow(2, 1 / 3.), pow(2, 2 / 3.)]
+
+        self.input_size = torch.tensor(input_size).float()
+        self.anchor_boxes = self._get_anchor_boxes(input_size=self.input_size)
+
+        self.create_bg_class = create_bg_class
+
+    def _get_anchor_wh(self):
+        '''Compute anchor width and height for each feature map.
+
+        Returns:
+          anchor_wh: (tensor) anchor wh, sized [#fm, #anchors_per_cell, 2].
+        '''
+        anchor_wh = []
+        for s in self.anchor_areas:
+            for ar in self.aspect_ratios:  # w/h = ar
+                h = math.sqrt(s / ar)
+                w = ar * h
+                for sr in self.scale_ratios:  # scale
+                    anchor_h = h * sr
+                    anchor_w = w * sr
+                    anchor_wh.append([anchor_w, anchor_h])
+        num_fms = len(self.anchor_areas)
+        return torch.Tensor(anchor_wh).view(num_fms, -1, 2)
+
+    def _get_anchor_boxes(self, input_size):
+        '''Compute anchor boxes for each feature map.
+
+        Args:
+          input_size: (tensor) model input size of (w,h).
+
+        Returns:
+          boxes: (list) anchor boxes for each feature map. Each of size [#anchors,4],
+                        where #anchors = fmw * fmh * #anchors_per_cell
+        '''
+        num_fms = len(self.anchor_areas)
+        anchor_wh = self._get_anchor_wh()
+        # fm_sizes = [(input_size / pow(2., i + 3)).ceil() for i in range(num_fms)]  # p3 -> p7 feature map sizes
+        if self.with_64:   # num_fms == 3:
+            fm_sizes = [(input_size / pow(2., i + 4)).ceil() for i in range(num_fms)]  # p4 -> p6 feature map sizes
+        else:  # num_fms == 2:
+            fm_sizes = [(input_size / pow(2., i + 5)).ceil() for i in range(num_fms)]  # p5 -> p6 feature map sizes
+
+        boxes = []
+        for i in range(num_fms):
+            fm_size = fm_sizes[i]
+            grid_size = input_size / fm_size
+            fm_w, fm_h = int(fm_size[0]), int(fm_size[1])
+            xy = meshgrid(fm_w, fm_h) + 0.5  # [fm_h*fm_w, 2]
+            xy = (xy * grid_size).view(fm_h, fm_w, 1, 2).expand(fm_h, fm_w, self.num_anchors, 2)
+            wh = anchor_wh[i].view(1, 1, self.num_anchors, 2).expand(fm_h, fm_w, self.num_anchors, 2)
+            box = torch.cat([xy - wh / 2., xy + wh / 2.], 3)  # [x,y,x,y]
+            boxes.append(box.view(-1, 4))
+        return torch.cat(boxes, 0)
+
+    def encode(self, boxes, labels, linemap):
+        '''Encode target bounding boxes and class labels.
+
+        We obey the Faster RCNN box coder:
+          tx = (x - anchor_x) / anchor_w
+          ty = (y - anchor_y) / anchor_h
+          tw = log(w / anchor_w)
+          th = log(h / anchor_h)
+
+        Args:
+          boxes: (tensor) bounding boxes of (xmin,ymin,xmax,ymax), sized [#obj, 4].
+          labels: (tensor) object class labels, sized [#obj,].
+
+        Returns:
+          loc_targets: (tensor) encoded bounding boxes, sized [#anchors,4].
+          cls_targets: (tensor) encoded class labels, sized [#anchors,].
+        '''
+        anchor_boxes = self.anchor_boxes
+        ious = box_iou(anchor_boxes, boxes)
+        max_ious, max_ids = ious.max(1)
+        boxes = boxes[max_ids]
+
+        # need to check if anchor_box center has positive linemap
+        anchor_ctrs = torch.zeros((anchor_boxes.shape[0], 2)).int()
+        anchor_ctrs[:, 0] = (anchor_boxes[:, 2] + anchor_boxes[:, 0]) / 2
+        anchor_ctrs[:, 1] = (anchor_boxes[:, 3] + anchor_boxes[:, 1]) / 2
+        linemap_val = np.asarray(linemap)[anchor_ctrs[:, 1], anchor_ctrs[:, 0]]
+
+        boxes = change_box_order(boxes, 'xyxy2xywh')
+        anchor_boxes = change_box_order(anchor_boxes, 'xyxy2xywh')
+
+        loc_xy = (boxes[:, :2] - anchor_boxes[:, :2]) / anchor_boxes[:, 2:]
+        loc_wh = torch.log(boxes[:, 2:] / anchor_boxes[:, 2:])
+        loc_targets = torch.cat([loc_xy, loc_wh], 1)
+        if self.create_bg_class:
+            cls_targets = 1 + labels[max_ids]
+        else:
+            # if background class 0 already exists in labels
+            cls_targets = labels[max_ids]
+
+        cls_targets[max_ious < 0.5] = 0  # WATCH OUT HERE, this is just for testing!!
+        # ignore = (max_ious > 0.4) & (max_ious < 0.5)  # ignore ious between [0.4,0.5]
+        # cls_targets[ignore] = -1  # mark ignored to -1
+
+        # ignore if box centered on line detection and iou below 0.5
+        ignore = torch.from_numpy(linemap_val.astype(np.uint8)) & (max_ious < 0.35)  # 0.5
+        cls_targets[ignore] = -1  # mark ignored to -1
+        return loc_targets, cls_targets
+
+    def decode(self, loc_preds, cls_preds, input_size, score_thresh=0.5, nms_thresh=0.5):
+        '''Decode outputs back to bouding box locations and class labels.
+
+        Args:
+          loc_preds: (tensor) predicted locations, sized [#anchors, 4].
+          cls_preds: (tensor) predicted class labels, sized [#anchors, #classes].
+          input_size: (tuple) model input size of (w,h).
+
+        Returns:
+          boxes: (tensor) decode box locations, sized [#obj,4].
+          labels: (tensor) class labels for each box, sized [#obj,].
+        '''
+        CLS_THRESH = score_thresh
+        NMS_THRESH = nms_thresh
+
+        input_size = torch.Tensor(input_size)
+        # anchor_boxes = self._get_anchor_boxes(input_size)  # xywh
+        anchor_boxes = change_box_order(self._get_anchor_boxes(input_size), 'xyxy2xywh')
+
+        loc_xy = loc_preds[:, :2]
+        loc_wh = loc_preds[:, 2:]
+
+        xy = loc_xy * anchor_boxes[:, 2:] + anchor_boxes[:, :2]
+        wh = loc_wh.exp() * anchor_boxes[:, 2:]
+        boxes = torch.cat([xy - wh / 2, xy + wh / 2], 1)  # [#anchors,4]
+
+        score, labels = cls_preds.sigmoid().max(1)  # [#anchors,]
+        ids = score > CLS_THRESH
+        ids = ids.nonzero().squeeze()  # [#obj,]
+        keep = box_nms(boxes[ids], score[ids], threshold=NMS_THRESH)
+        return boxes[ids][keep], labels[ids][keep]  # , score[ids][keep]
+
+    def decode_boxes(self, loc_preds):
+
+        anchor_boxes = change_box_order(self.anchor_boxes, 'xyxy2xywh')
+
+        loc_xy = loc_preds[:, :2]
+        loc_wh = loc_preds[:, 2:]
+
+        xy = loc_xy * anchor_boxes[:, 2:] + anchor_boxes[:, :2]
+        wh = loc_wh.exp() * anchor_boxes[:, 2:]
+        box_preds = torch.cat([xy - wh / 2, xy + wh / 2], 1)
+
+        boxes = box_preds
+        return boxes
@@ -0,0 +1,358 @@
+'''Compute PASCAL_VOC MAP.
+
+Reference:
+  https://github.com/chainer/chainercv/blob/master/chainercv/evaluations/eval_detection_voc.py
+'''
+from __future__ import division
+
+import six
+import itertools
+import numpy as np
+
+from collections import defaultdict
+
+
+def voc_eval(pred_bboxes, pred_labels, pred_scores, gt_bboxes, gt_labels,
+             gt_difficults=None, iou_thresh=0.5, use_07_metric=True):
+    '''Wrap VOC evaluation for PyTorch.'''
+    pred_bboxes = [xy2yx(b).numpy() for b in pred_bboxes]
+    pred_labels = [label.numpy() for label in pred_labels]
+    pred_scores = [score.numpy() for score in pred_scores]
+    gt_bboxes = [xy2yx(b).numpy() for b in gt_bboxes]
+    gt_labels = [label.numpy() for label in gt_labels]
+    return eval_detection_voc(
+               pred_bboxes, pred_labels, pred_scores, gt_bboxes,
+               gt_labels, gt_difficults, iou_thresh, use_07_metric)
+
+def xy2yx(boxes):
+    '''Convert box (xmin,ymin,xmax,ymax) to (ymin,xmin,ymax,xmax).'''
+    c0 = boxes[:,0].clone()
+    c2 = boxes[:,2].clone()
+    boxes[:,0] = boxes[:,1]
+    boxes[:,1] = c0
+    boxes[:,2] = boxes[:,3]
+    boxes[:,3] = c2
+    return boxes
+
+def bbox_iou(bbox_a, bbox_b):
+    '''Calculate the Intersection of Unions (IoUs) between bounding boxes.
+
+    Args:
+        bbox_a (array): An array whose shape is :math:`(N, 4)`.
+            :math:`N` is the number of bounding boxes.
+            The dtype should be :obj:`numpy.float32`.
+        bbox_b (array): An array similar to :obj:`bbox_a`,
+            whose shape is :math:`(K, 4)`.
+            The dtype should be :obj:`numpy.float32`.
+
+    Returns:
+        array:
+        An array whose shape is :math:`(N, K)`. \
+        An element at index :math:`(n, k)` contains IoUs between \
+        :math:`n` th bounding box in :obj:`bbox_a` and :math:`k` th bounding \
+        box in :obj:`bbox_b`.
+    '''
+    # top left
+    tl = np.maximum(bbox_a[:, None, :2], bbox_b[:, :2])
+    # bottom right
+    br = np.minimum(bbox_a[:, None, 2:], bbox_b[:, 2:])
+
+    area_i = np.prod(br - tl, axis=2) * (tl < br).all(axis=2)
+    area_a = np.prod(bbox_a[:, 2:] - bbox_a[:, :2], axis=1)
+    area_b = np.prod(bbox_b[:, 2:] - bbox_b[:, :2], axis=1)
+    return area_i / (area_a[:, None] + area_b - area_i)
+
+def eval_detection_voc(
+        pred_bboxes, pred_labels, pred_scores, gt_bboxes, gt_labels,
+        gt_difficults=None,
+        iou_thresh=0.5, use_07_metric=False):
+    """Calculate average precisions based on evaluation code of PASCAL VOC.
+
+    This function evaluates predicted bounding boxes obtained from a dataset
+    which has :math:`N` images by using average precision for each class.
+    The code is based on the evaluation code used in PASCAL VOC Challenge.
+
+    Args:
+        pred_bboxes (iterable of numpy.ndarray): An iterable of :math:`N`
+            sets of bounding boxes.
+            Its index corresponds to an index for the base dataset.
+            Each element of :obj:`pred_bboxes` is a set of coordinates
+            of bounding boxes. This is an array whose shape is :math:`(R, 4)`,
+            where :math:`R` corresponds
+            to the number of bounding boxes, which may vary among boxes.
+            The second axis corresponds to
+            :math:`y_{min}, x_{min}, y_{max}, x_{max}` of a bounding box.
+        pred_labels (iterable of numpy.ndarray): An iterable of labels.
+            Similar to :obj:`pred_bboxes`, its index corresponds to an
+            index for the base dataset. Its length is :math:`N`.
+        pred_scores (iterable of numpy.ndarray): An iterable of confidence
+            scores for predicted bounding boxes. Similar to :obj:`pred_bboxes`,
+            its index corresponds to an index for the base dataset.
+            Its length is :math:`N`.
+        gt_bboxes (iterable of numpy.ndarray): An iterable of ground truth
+            bounding boxes
+            whose length is :math:`N`. An element of :obj:`gt_bboxes` is a
+            bounding box whose shape is :math:`(R, 4)`. Note that the number of
+            bounding boxes in each image does not need to be same as the number
+            of corresponding predicted boxes.
+        gt_labels (iterable of numpy.ndarray): An iterable of ground truth
+            labels which are organized similarly to :obj:`gt_bboxes`.
+        gt_difficults (iterable of numpy.ndarray): An iterable of boolean
+            arrays which is organized similarly to :obj:`gt_bboxes`.
+            This tells whether the
+            corresponding ground truth bounding box is difficult or not.
+            By default, this is :obj:`None`. In that case, this function
+            considers all bounding boxes to be not difficult.
+        iou_thresh (float): A prediction is correct if its Intersection over
+            Union with the ground truth is above this value.
+        use_07_metric (bool): Whether to use PASCAL VOC 2007 evaluation metric
+            for calculating average precision. The default value is
+            :obj:`False`.
+
+    Returns:
+        dict:
+
+        The keys, value-types and the description of the values are listed
+        below.
+
+        * **ap** (*numpy.ndarray*): An array of average precisions. \
+            The :math:`l`-th value corresponds to the average precision \
+            for class :math:`l`. If class :math:`l` does not exist in \
+            either :obj:`pred_labels` or :obj:`gt_labels`, the corresponding \
+            value is set to :obj:`numpy.nan`.
+        * **map** (*float*): The average of Average Precisions over classes.
+
+    """
+
+    prec, rec = calc_detection_voc_prec_rec(
+        pred_bboxes, pred_labels, pred_scores,
+        gt_bboxes, gt_labels, gt_difficults,
+        iou_thresh=iou_thresh)
+
+    ap = calc_detection_voc_ap(prec, rec, use_07_metric=use_07_metric)
+
+    return {'ap': ap, 'map': np.nanmean(ap)}
+
+
+def calc_detection_voc_prec_rec(
+        pred_bboxes, pred_labels, pred_scores, gt_bboxes, gt_labels,
+        gt_difficults=None,
+        iou_thresh=0.5):
+    """Calculate precision and recall based on evaluation code of PASCAL VOC.
+
+    This function calculates precision and recall of
+    predicted bounding boxes obtained from a dataset which has :math:`N`
+    images.
+    The code is based on the evaluation code used in PASCAL VOC Challenge.
+
+    Args:
+        pred_bboxes (iterable of numpy.ndarray): An iterable of :math:`N`
+            sets of bounding boxes.
+            Its index corresponds to an index for the base dataset.
+            Each element of :obj:`pred_bboxes` is a set of coordinates
+            of bounding boxes. This is an array whose shape is :math:`(R, 4)`,
+            where :math:`R` corresponds
+            to the number of bounding boxes, which may vary among boxes.
+            The second axis corresponds to
+            :math:`y_{min}, x_{min}, y_{max}, x_{max}` of a bounding box.
+        pred_labels (iterable of numpy.ndarray): An iterable of labels.
+            Similar to :obj:`pred_bboxes`, its index corresponds to an
+            index for the base dataset. Its length is :math:`N`.
+        pred_scores (iterable of numpy.ndarray): An iterable of confidence
+            scores for predicted bounding boxes. Similar to :obj:`pred_bboxes`,
+            its index corresponds to an index for the base dataset.
+            Its length is :math:`N`.
+        gt_bboxes (iterable of numpy.ndarray): An iterable of ground truth
+            bounding boxes
+            whose length is :math:`N`. An element of :obj:`gt_bboxes` is a
+            bounding box whose shape is :math:`(R, 4)`. Note that the number of
+            bounding boxes in each image does not need to be same as the number
+            of corresponding predicted boxes.
+        gt_labels (iterable of numpy.ndarray): An iterable of ground truth
+            labels which are organized similarly to :obj:`gt_bboxes`.
+        gt_difficults (iterable of numpy.ndarray): An iterable of boolean
+            arrays which is organized similarly to :obj:`gt_bboxes`.
+            This tells whether the
+            corresponding ground truth bounding box is difficult or not.
+            By default, this is :obj:`None`. In that case, this function
+            considers all bounding boxes to be not difficult.
+        iou_thresh (float): A prediction is correct if its Intersection over
+            Union with the ground truth is above this value..
+
+    Returns:
+        tuple of two lists:
+        This function returns two lists: :obj:`prec` and :obj:`rec`.
+
+        * :obj:`prec`: A list of arrays. :obj:`prec[l]` is precision \
+            for class :math:`l`. If class :math:`l` does not exist in \
+            either :obj:`pred_labels` or :obj:`gt_labels`, :obj:`prec[l]` is \
+            set to :obj:`None`.
+        * :obj:`rec`: A list of arrays. :obj:`rec[l]` is recall \
+            for class :math:`l`. If class :math:`l` that is not marked as \
+            difficult does not exist in \
+            :obj:`gt_labels`, :obj:`rec[l]` is \
+            set to :obj:`None`.
+
+    """
+
+    pred_bboxes = iter(pred_bboxes)
+    pred_labels = iter(pred_labels)
+    pred_scores = iter(pred_scores)
+    gt_bboxes = iter(gt_bboxes)
+    gt_labels = iter(gt_labels)
+    if gt_difficults is None:
+        gt_difficults = itertools.repeat(None)
+    else:
+        gt_difficults = iter(gt_difficults)
+
+    n_pos = defaultdict(int)
+    score = defaultdict(list)
+    match = defaultdict(list)
+
+    for pred_bbox, pred_label, pred_score, gt_bbox, gt_label, gt_difficult in \
+        six.moves.zip(
+            pred_bboxes, pred_labels, pred_scores,
+            gt_bboxes, gt_labels, gt_difficults):
+
+        if gt_difficult is None:
+            gt_difficult = np.zeros(gt_bbox.shape[0], dtype=bool)
+
+        for l in np.unique(np.concatenate((pred_label, gt_label)).astype(int)):
+            pred_mask_l = pred_label == l
+            pred_bbox_l = pred_bbox[pred_mask_l]
+            pred_score_l = pred_score[pred_mask_l]
+            # sort by score
+            order = pred_score_l.argsort()[::-1]
+            pred_bbox_l = pred_bbox_l[order]
+            pred_score_l = pred_score_l[order]
+
+            gt_mask_l = gt_label == l
+            gt_bbox_l = gt_bbox[gt_mask_l]
+            gt_difficult_l = gt_difficult[gt_mask_l]
+
+            n_pos[l] += np.logical_not(gt_difficult_l).sum()
+            score[l].extend(pred_score_l)
+
+            if len(pred_bbox_l) == 0:
+                continue
+            if len(gt_bbox_l) == 0:
+                match[l].extend((0,) * pred_bbox_l.shape[0])
+                continue
+
+            # VOC evaluation follows integer typed bounding boxes.
+            pred_bbox_l = pred_bbox_l.copy()
+            pred_bbox_l[:, 2:] += 1
+            gt_bbox_l = gt_bbox_l.copy()
+            gt_bbox_l[:, 2:] += 1
+
+            iou = bbox_iou(pred_bbox_l, gt_bbox_l)
+            gt_index = iou.argmax(axis=1)
+            # set -1 if there is no matching ground truth
+            gt_index[iou.max(axis=1) < iou_thresh] = -1
+            del iou
+
+            selec = np.zeros(gt_bbox_l.shape[0], dtype=bool)
+            for gt_idx in gt_index:
+                if gt_idx >= 0:
+                    if gt_difficult_l[gt_idx]:
+                        match[l].append(-1)
+                    else:
+                        if not selec[gt_idx]:
+                            match[l].append(1)
+                        else:
+                            match[l].append(0)
+                    selec[gt_idx] = True
+                else:
+                    match[l].append(0)
+
+    for iter_ in (
+            pred_bboxes, pred_labels, pred_scores,
+            gt_bboxes, gt_labels, gt_difficults):
+        if next(iter_, None) is not None:
+            raise ValueError('Length of input iterables need to be same.')
+
+    n_fg_class = max(n_pos.keys()) + 1
+    prec = [None] * n_fg_class
+    rec = [None] * n_fg_class
+
+    for l in n_pos.keys():
+        score_l = np.array(score[l])
+        match_l = np.array(match[l], dtype=np.int8)
+
+        order = score_l.argsort()[::-1]
+        match_l = match_l[order]
+
+        tp = np.cumsum(match_l == 1)
+        fp = np.cumsum(match_l == 0)
+
+        # If an element of fp + tp is 0,
+        # the corresponding element of prec[l] is nan.
+        prec[l] = tp / (fp + tp)
+        # If n_pos[l] is 0, rec[l] is None.
+        if n_pos[l] > 0:
+            rec[l] = tp / n_pos[l]
+
+    return prec, rec
+
+
+def calc_detection_voc_ap(prec, rec, use_07_metric=False):
+    """Calculate average precisions based on evaluation code of PASCAL VOC.
+
+    This function calculates average precisions
+    from given precisions and recalls.
+    The code is based on the evaluation code used in PASCAL VOC Challenge.
+
+    Args:
+        prec (list of numpy.array): A list of arrays.
+            :obj:`prec[l]` indicates precision for class :math:`l`.
+            If :obj:`prec[l]` is :obj:`None`, this function returns
+            :obj:`numpy.nan` for class :math:`l`.
+        rec (list of numpy.array): A list of arrays.
+            :obj:`rec[l]` indicates recall for class :math:`l`.
+            If :obj:`rec[l]` is :obj:`None`, this function returns
+            :obj:`numpy.nan` for class :math:`l`.
+        use_07_metric (bool): Whether to use PASCAL VOC 2007 evaluation metric
+            for calculating average precision. The default value is
+            :obj:`False`.
+
+    Returns:
+        ~numpy.ndarray:
+        This function returns an array of average precisions.
+        The :math:`l`-th value corresponds to the average precision
+        for class :math:`l`. If :obj:`prec[l]` or :obj:`rec[l]` is
+        :obj:`None`, the corresponding value is set to :obj:`numpy.nan`.
+
+    """
+
+    n_fg_class = len(prec)
+    ap = np.empty(n_fg_class)
+    for l in six.moves.range(n_fg_class):
+        if prec[l] is None or rec[l] is None:
+            ap[l] = np.nan
+            continue
+
+        if use_07_metric:
+            # 11 point metric
+            ap[l] = 0
+            for t in np.arange(0., 1.1, 0.1):
+                if np.sum(rec[l] >= t) == 0:
+                    p = 0
+                else:
+                    p = np.max(np.nan_to_num(prec[l])[rec[l] >= t])
+                ap[l] += p / 11
+        else:
+            # correct AP calculation
+            # first append sentinel values at the end
+            mpre = np.concatenate(([0], np.nan_to_num(prec[l]), [0]))
+            mrec = np.concatenate(([0], rec[l], [1]))
+
+            mpre = np.maximum.accumulate(mpre[::-1])[::-1]
+
+            # to calculate area under PR curve, look for points
+            # where X axis (recall) changes value
+            i = np.where(mrec[1:] != mrec[:-1])[0]
+
+            # and sum (\Delta recall) * prec
+            ap[l] = np.sum((mrec[i + 1] - mrec[i]) * mpre[i + 1])
+
+    return ap
@@ -0,0 +1 @@
+
@@ -0,0 +1,84 @@
+from __future__ import print_function
+
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+
+from torch.autograd import Variable
+from ..one_hot_embedding import one_hot_embedding
+
+
+class FocalLoss(nn.Module):
+    def __init__(self, num_classes):
+        super(FocalLoss, self).__init__()
+        self.num_classes = num_classes
+
+    def _focal_loss(self, x, y):
+        '''Focal loss.
+
+        This is described in the original paper.
+        With BCELoss, the background should not be counted in num_classes.
+
+        Args:
+          x: (tensor) predictions, sized [N,D].
+          y: (tensor) targets, sized [N,].
+
+        Return:
+          (tensor) focal loss.
+        '''
+        alpha = 0.25  # balance param
+        gamma = 2  # focus param
+        size_average = False
+
+        t = one_hot_embedding(y, self.num_classes) # y-1
+        p = x.sigmoid()
+        pt = torch.where(t > 0, p, 1 - p)  # pt = p if t > 0 else 1-p
+        w = (1 - pt).pow(gamma)
+        w = torch.where(t > 0, alpha * w, (1 - alpha) * w)
+        loss = F.binary_cross_entropy_with_logits(x, t, w, size_average=size_average)
+
+        # according to https://github.com/c0nn3r/RetinaNet/blob/master/focal_loss.py
+        # logpt = - F.cross_entropy(x, y)
+        # pt = torch.exp(logpt)
+        # focal_loss = -((1 - pt) ** gamma) * logpt
+        # loss = alpha * focal_loss
+        # averaging (or not) loss
+        # if size_average:
+        #     loss = loss.mean()
+        # else:
+        #     loss = loss.sum()
+        return loss
+
+    def forward(self, loc_preds, loc_targets, cls_preds, cls_targets):
+        '''Compute loss between (loc_preds, loc_targets) and (cls_preds, cls_targets).
+
+        Args:
+          loc_preds: (tensor) predicted locations, sized [batch_size, #anchors, 4].
+          loc_targets: (tensor) encoded target locations, sized [batch_size, #anchors, 4].
+          cls_preds: (tensor) predicted class confidences, sized [batch_size, #anchors, #classes].
+          cls_targets: (tensor) encoded target labels, sized [batch_size, #anchors].
+
+        loss:
+          (tensor) loss = SmoothL1Loss(loc_preds, loc_targets) + FocalLoss(cls_preds, cls_targets).
+        '''
+        batch_size, num_boxes = cls_targets.size()
+        pos = cls_targets > 0  # [N,#anchors]
+        num_pos = pos.sum().item()
+
+        # ===============================================================
+        # loc_loss = SmoothL1Loss(pos_loc_preds, pos_loc_targets)
+        # ===============================================================
+        mask = pos.unsqueeze(2).expand_as(loc_preds)  # [N,#anchors,4]
+        loc_loss = F.smooth_l1_loss(loc_preds[mask], loc_targets[mask], size_average=False)
+
+        # ===============================================================
+        # cls_loss = FocalLoss(cls_preds, cls_targets)
+        # ===============================================================
+        pos_neg = cls_targets > -1  # exclude ignored anchors
+        mask = pos_neg.unsqueeze(2).expand_as(cls_preds)
+        masked_cls_preds = cls_preds[mask].view(-1, self.num_classes)
+        cls_loss = self._focal_loss(masked_cls_preds, cls_targets[pos_neg])
+
+        print('loc_loss: %.3f | cls_loss: %.3f' % (loc_loss.item() / num_pos, cls_loss.item() / num_pos), end=' | ')
+        loss = (loc_loss + cls_loss) / num_pos
+        return loss
@@ -0,0 +1,71 @@
+from __future__ import print_function
+
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+
+
+class SSDLoss(nn.Module):
+    def __init__(self, num_classes):
+        super(SSDLoss, self).__init__()
+        self.num_classes = num_classes
+
+    def _hard_negative_mining(self, cls_loss, pos):
+        '''Return negative indices that is 3x the number as positive indices.
+
+        Args:
+          cls_loss: (tensor) cross entroy loss between cls_preds and cls_targets, sized [N,#anchors].
+          pos: (tensor) positive class mask, sized [N,#anchors].
+
+        Return:
+          (tensor) negative indices, sized [N,#anchors].
+        '''
+        cls_loss = cls_loss * (pos.float() - 1)
+
+        _, idx = cls_loss.sort(1)  # sort by negative losses
+        _, rank = idx.sort(1)      # [N,#anchors]
+
+        num_neg = 3*pos.sum(1)  # [N,]
+        neg = rank < num_neg[:,None]   # [N,#anchors]
+        return neg
+
+    def forward(self, loc_preds, loc_targets, cls_preds, cls_targets):
+        '''Compute loss between (loc_preds, loc_targets) and (cls_preds, cls_targets).
+
+        Args:
+          loc_preds: (tensor) predicted locations, sized [N, #anchors, 4].
+          loc_targets: (tensor) encoded target locations, sized [N, #anchors, 4].
+          cls_preds: (tensor) predicted class confidences, sized [N, #anchors, #classes].
+          cls_targets: (tensor) encoded target labels, sized [N, #anchors].
+
+        loss:
+          (tensor) loss = SmoothL1Loss(loc_preds, loc_targets) + CrossEntropyLoss(cls_preds, cls_targets).
+        '''
+        pos = cls_targets > 0  # [N,#anchors]
+        batch_size = pos.size(0)
+        num_pos = pos.sum().item()
+
+        #===============================================================
+        # loc_loss = SmoothL1Loss(pos_loc_preds, pos_loc_targets)
+        #===============================================================
+        mask = pos.unsqueeze(2).expand_as(loc_preds)       # [N,#anchors,4]
+        loc_loss = F.smooth_l1_loss(loc_preds[mask], loc_targets[mask], size_average=False)
+
+        #===============================================================
+        # cls_loss = CrossEntropyLoss(cls_preds, cls_targets)
+        #===============================================================
+        # TD: added clamp, because cross entropy does not handle negative indices well
+        cls_loss = F.cross_entropy(cls_preds.view(-1,self.num_classes), \
+                                   cls_targets.clamp(min=0).view(-1), reduce=False)  # [N*#anchors,]
+        cls_loss = cls_loss.view(batch_size, -1)
+        cls_loss[cls_targets < 0] = 0  # set ignored loss to 0
+
+        neg = self._hard_negative_mining(cls_loss, pos)  # [N,#anchors]
+        cls_loss = cls_loss[pos|neg].sum()
+        if num_pos > 0:  # TD mod to prevent div by zero
+            print('loc_loss: %.3f | cls_loss: %.3f' % (loc_loss.item()/num_pos, cls_loss.item()/num_pos), end=' | ')
+            loss = (loc_loss+cls_loss)/num_pos
+        else:
+            print('num_pos zero exception')
+            loss = (loc_loss+cls_loss)/1.
+        return loss
@@ -0,0 +1,38 @@
+import torch
+
+
+def meshgrid(x, y, row_major=True):
+    '''Return meshgrid in range x & y.
+
+    Args:
+      x: (int) first dim range.
+      y: (int) second dim range.
+      row_major: (bool) row major or column major.
+
+    Returns:
+      (tensor) meshgrid, sized [x*y,2]
+
+    Example:
+    >> meshgrid(3,2)
+    0  0
+    1  0
+    2  0
+    0  1
+    1  1
+    2  1
+    [torch.FloatTensor of size 6x2]
+
+    >> meshgrid(3,2,row_major=False)
+    0  0
+    0  1
+    0  2
+    1  0
+    1  1
+    1  2
+    [torch.FloatTensor of size 6x2]
+    '''
+    a = torch.arange(0, x, dtype=torch.float)  # TD: make it float (0.4.1)
+    b = torch.arange(0, y, dtype=torch.float)  # TD: make it float (0.4.1)
+    xx = a.repeat(y).view(-1, 1)
+    yy = b.view(-1, 1).repeat(1, x).view(-1, 1)
+    return torch.cat([xx, yy], 1) if row_major else torch.cat([yy, xx], 1)
@@ -0,0 +1,53 @@
+import torch
+import torch.nn as nn
+
+
+# this should import a specific architecture for cuneiform sign detection
+
+
+class FPNSSD(nn.Module):
+    num_anchors = 12
+
+    def __init__(self, fpn_model, num_classes):
+        super(FPNSSD, self).__init__()
+        self.fpn = fpn_model
+        self.num_classes = num_classes
+        self.loc_head = self._make_head(self.num_anchors * 4)
+        self.cls_head = self._make_head(self.num_anchors * self.num_classes)
+
+    def forward(self, x):
+        loc_preds = []
+        cls_preds = []
+        fms = self.fpn(x)
+        for fm in fms:
+            loc_pred = self.loc_head(fm)
+            cls_pred = self.cls_head(fm)
+            loc_pred = loc_pred.permute(0, 2, 3, 1).reshape(x.size(0), -1,
+                                                            4)  # [N, 9*4,H,W] -> [N,H,W, 9*4] -> [N,H*W*9, 4]
+            cls_pred = cls_pred.permute(0, 2, 3, 1).reshape(x.size(0), -1,
+                                                            self.num_classes)  # [N,9*NC,H,W] -> [N,H,W,9*NC] -> [N,H*W*9,NC]
+            loc_preds.append(loc_pred)
+            cls_preds.append(cls_pred)
+        return torch.cat(loc_preds, 1), torch.cat(cls_preds, 1)
+
+    def _make_head(self, out_planes):
+        layers = []
+        for _ in range(4):
+            layers.append(nn.Conv2d(256, 256, kernel_size=3, stride=1, padding=1))
+            layers.append(nn.ReLU(True))
+        layers.append(nn.Conv2d(256, out_planes, kernel_size=3, stride=1, padding=1))
+        return nn.Sequential(*layers)
+
+    def freeze_bn(self):
+        '''Freeze BatchNorm layers.'''
+        for layer in self.modules():
+            if isinstance(layer, nn.BatchNorm2d):
+                layer.eval()
+
+
+# def test():
+#     net = FPNSSD(21)
+#     loc_preds, cls_preds = net(torch.randn(1, 3, 512, 512))
+#     print(loc_preds.size(), cls_preds.size())
+
+# test()
@@ -0,0 +1,54 @@
+import torch
+import torch.nn as nn
+
+
+# this should import a specific architecture for cuneiform sign detection
+
+
+class RPN(nn.Module):
+    num_anchors = 12
+
+    def __init__(self, fpn_model, num_classes, with_64):
+        super(RPN, self).__init__()
+        self.fpn = fpn_model
+        self.num_classes = num_classes
+        self.with_p4 = int(with_64)
+        self.loc_head = self._make_head(self.num_anchors * 4)
+        self.cls_head = self._make_head(self.num_anchors * self.num_classes)
+
+    def forward(self, x):
+        loc_preds = []
+        cls_preds = []
+        fms = self.fpn(x)
+        for fm in fms:
+            loc_pred = self.loc_head(fm)
+            cls_pred = self.cls_head(fm)
+            loc_pred = loc_pred.permute(0, 2, 3, 1).reshape(x.size(0), -1,
+                                                            4)  # [N, 9*4,H,W] -> [N,H,W, 9*4] -> [N,H*W*9, 4]
+            cls_pred = cls_pred.permute(0, 2, 3, 1).reshape(x.size(0), -1,
+                                                            self.num_classes)  # [N,9*NC,H,W] -> [N,H,W,9*NC] -> [N,H*W*9,NC]
+            loc_preds.append(loc_pred)
+            cls_preds.append(cls_pred)
+        return torch.cat(loc_preds, 1), torch.cat(cls_preds, 1), fms[self.with_p4]
+
+    def _make_head(self, out_planes):
+        layers = []
+        for _ in range(4):
+            layers.append(nn.Conv2d(256, 256, kernel_size=3, stride=1, padding=1))
+            layers.append(nn.ReLU(True))
+        layers.append(nn.Conv2d(256, out_planes, kernel_size=3, stride=1, padding=1))
+        return nn.Sequential(*layers)
+
+    def freeze_bn(self):
+        '''Freeze BatchNorm layers.'''
+        for layer in self.modules():
+            if isinstance(layer, nn.BatchNorm2d):
+                layer.eval()
+
+
+# def test():
+#     net = FPNSSD(21)
+#     loc_preds, cls_preds = net(torch.randn(1, 3, 512, 512))
+#     print(loc_preds.size(), cls_preds.size())
+
+# test()
@@ -0,0 +1,15 @@
+import torch
+
+
+def one_hot_embedding(labels, num_classes):
+    '''Embedding labels to one-hot.
+
+    Args:
+      labels: (LongTensor) class labels, sized [N,].
+      num_classes: (int) number of classes.
+
+    Returns:
+      (tensor) encoded labels, sized [N,#classes].
+    '''
+    y = torch.eye(num_classes, device=labels.device)  # [D,D]
+    return y[labels]  # [N,D]
@@ -0,0 +1,22 @@
+import torch
+
+
+def center_crop(img, boxes, size):
+    '''Crops the given PIL Image at the center.
+    Args:
+      img: (PIL.Image) image to be cropped.
+      boxes: (tensor) object boxes, sized [#ojb,4].
+      size (tuple): desired output size of (w,h).
+    Returns:
+      img: (PIL.Image) center cropped image.
+      boxes: (tensor) center cropped boxes.
+    '''
+    w, h = img.size
+    ow, oh = size
+    i = int(round((h - oh) / 2.))
+    j = int(round((w - ow) / 2.))
+    img = img.crop((j, i, j + ow, i + oh))
+    boxes -= torch.Tensor([j, i, j, i])
+    boxes[:, 0::2].clamp(min=0, max=ow - 1)
+    boxes[:, 1::2].clamp(min=0, max=oh - 1)
+    return img, boxes
@@ -0,0 +1,25 @@
+import math
+import torch
+import random
+
+from PIL import Image
+from ..box import box_iou, box_clamp
+
+
+def crop_box(img, boxes, labels, box):
+    x, y, x2, y2 = box
+    w = x2 - x
+    h = y2 - y
+    img = img.crop((x, y, x2, y2))
+    # check if center is still inside tile_box, otherwise ignore box
+    # (if center is not inside tile box, not possible to get IoU >= 0.5 --> treated as background anyways)
+    center = (boxes[:, :2] + boxes[:, 2:]) / 2
+    mask = (center[:, 0] >= x) & (center[:, 0] <= x2) & (center[:, 1] >= y) & (center[:, 1] <= y2)
+    if mask.any():
+        boxes = boxes[mask] - torch.tensor([x, y, x, y], dtype=torch.float)
+        boxes = box_clamp(boxes, 0, 0, w, h)
+        labels = labels[mask]
+    else:
+        boxes = torch.tensor([[0, 0, 0, 0]], dtype=torch.float)
+        labels = torch.tensor([0], dtype=torch.long)
+    return img, boxes, labels
@@ -0,0 +1,23 @@
+import torch
+import random
+
+from PIL import Image
+
+
+def pad(img, target_size):
+    '''Pad image with zeros to the specified size.
+
+    Args:
+      img: (PIL.Image) image to be padded.
+      target_size: (tuple) target size of (ow,oh).
+
+    Returns:
+      img: (PIL.Image) padded image.
+
+    Reference:
+      `tf.image.pad_to_bounding_box`
+    '''
+    w, h = img.size
+    canvas = Image.new('RGB', target_size)
+    canvas.paste(img, (0,0))  # paste on the left-up corner
+    return canvas
@@ -0,0 +1,23 @@
+import torch
+import random
+
+from PIL import Image
+
+
+def pad(img, target_size):
+    '''Pad image with zeros to the specified size.
+
+    Args:
+      img: (PIL.Image) image to be padded.
+      target_size: (tuple) target size of (ow,oh).
+
+    Returns:
+      img: (PIL.Image) padded image.
+
+    Reference:
+      `tf.image.pad_to_bounding_box`
+    '''
+    w, h = img.size
+    canvas = Image.new('L', target_size)
+    canvas.paste(img, (0, 0))  # paste on the left-up corner
+    return canvas
@@ -0,0 +1,64 @@
+'''This random crop strategy is described in paper:
+   [1] SSD: Single Shot MultiBox Detector
+'''
+import math
+import torch
+import random
+
+from PIL import Image
+# from torchcv.utils.box import box_iou, box_clamp
+from ..box import box_iou, box_clamp
+
+
+def random_crop(
+        img, boxes, labels,
+        min_scale=0.3,
+        max_aspect_ratio=2.):
+    '''Randomly crop a PIL image.
+
+    Args:
+      img: (PIL.Image) image.
+      boxes: (tensor) bounding boxes, sized [#obj, 4].
+      labels: (tensor) bounding box labels, sized [#obj,].
+      min_scale: (float) minimal image width/height scale.
+      max_aspect_ratio: (float) maximum width/height aspect ratio.
+
+    Returns:
+      img: (PIL.Image) cropped image.
+      boxes: (tensor) object boxes.
+      labels: (tensor) object labels.
+    '''
+    imw, imh = img.size
+    params = [(0, 0, imw, imh)]  # crop roi (x,y,w,h) out
+    for min_iou in (0, 0.1, 0.3, 0.5, 0.7, 0.9):
+        for _ in range(100):
+            scale = random.uniform(min_scale, 1)
+            aspect_ratio = random.uniform(
+                max(1 / max_aspect_ratio, scale * scale),
+                min(max_aspect_ratio, 1 / (scale * scale)))
+            w = int(imw * scale * math.sqrt(aspect_ratio))
+            h = int(imh * scale / math.sqrt(aspect_ratio))
+
+            x = random.randrange(imw - w)
+            y = random.randrange(imh - h)
+
+            roi = torch.tensor([[x, y, x + w, y + h]], dtype=torch.float)
+            ious = box_iou(boxes, roi)
+            if ious.min() >= min_iou:
+                params.append((x, y, w, h))
+                break
+
+    x, y, w, h = random.choice(params)
+    img = img.crop((x, y, x + w, y + h))
+
+    center = (boxes[:, :2] + boxes[:, 2:]) / 2
+    mask = (center[:, 0] >= x) & (center[:, 0] <= x + w) \
+           & (center[:, 1] >= y) & (center[:, 1] <= y + h)
+    if mask.any():
+        boxes = boxes[mask] - torch.tensor([x, y, x, y], dtype=torch.float)
+        boxes = box_clamp(boxes, 0, 0, w, h)
+        labels = labels[mask]
+    else:
+        boxes = torch.tensor([[0, 0, 0, 0]], dtype=torch.float)
+        labels = torch.tensor([0], dtype=torch.long)
+    return img, boxes, labels
@@ -0,0 +1,55 @@
+'''This random crop strategy is described in paper:
+   [1] SSD: Single Shot MultiBox Detector
+'''
+import math
+import torch
+import random
+
+from PIL import Image
+# from torchcv.utils.box import box_iou, box_clamp
+from ..box import box_iou, box_clamp
+
+
+def random_crop_tile(
+        img, boxes, labels,
+        scale_range=[0.8, 1],
+        max_aspect_ratio=2.):
+    '''Randomly crop a PIL image.
+
+    Args:
+      img: (PIL.Image) image.
+      boxes: (tensor) bounding boxes, sized [#obj, 4].
+      labels: (tensor) bounding box labels, sized [#obj,].
+      scale_range: [float,float] minimal image width/height scale.
+      max_aspect_ratio: (float) maximum width/height aspect ratio.
+
+    Returns:
+      img: (PIL.Image) cropped image.
+      boxes: (tensor) object boxes.
+      labels: (tensor) object labels.
+    '''
+    imw, imh = img.size
+
+    scale = random.uniform(scale_range[0], scale_range[1])
+    aspect_ratio = random.uniform(
+        max(1 / max_aspect_ratio, scale * scale),
+        min(max_aspect_ratio, 1 / (scale * scale)))
+    w = int(imw * scale * math.sqrt(aspect_ratio))
+    h = int(imh * scale / math.sqrt(aspect_ratio))
+
+    x = random.randrange(imw - w)
+    y = random.randrange(imh - h)
+
+    img = img.crop((x, y, x + w, y + h))
+
+    center = (boxes[:, :2] + boxes[:, 2:]) / 2
+    mask = (center[:, 0] >= x) & (center[:, 0] <= x + w) \
+           & (center[:, 1] >= y) & (center[:, 1] <= y + h)
+    if mask.any():
+        boxes = boxes[mask] - torch.tensor([x, y, x, y], dtype=torch.float)
+        boxes = box_clamp(boxes, 0, 0, w, h)
+        labels = labels[mask]
+    else:
+        boxes = torch.tensor([[0, 0, 0, 0]], dtype=torch.float)
+        labels = torch.tensor([0], dtype=torch.long)
+    return img, boxes, labels
@@ -0,0 +1,56 @@
+import torch
+import random
+import torchvision.transforms as transforms
+
+from PIL import Image
+
+
+def random_distort(
+        img,
+        brightness_delta=32 / 255.,
+        contrast_delta=0.5,
+        saturation_delta=0.5,
+        hue_delta=0.1):
+    '''A color related data augmentation used in SSD.
+
+    Args:
+      img: (PIL.Image) image to be color augmented.
+      brightness_delta: (float) shift of brightness, range from [1-delta,1+delta].
+      contrast_delta: (float) shift of contrast, range from [1-delta,1+delta].
+      saturation_delta: (float) shift of saturation, range from [1-delta,1+delta].
+      hue_delta: (float) shift of hue, range from [-delta,delta].
+
+    Returns:
+      img: (PIL.Image) color augmented image.
+    '''
+
+    def brightness(img, delta):
+        if random.random() < 0.5:
+            img = transforms.ColorJitter(brightness=delta)(img)
+        return img
+
+    def contrast(img, delta):
+        if random.random() < 0.5:
+            img = transforms.ColorJitter(contrast=delta)(img)
+        return img
+
+    def saturation(img, delta):
+        if random.random() < 0.5:
+            img = transforms.ColorJitter(saturation=delta)(img)
+        return img
+
+    def hue(img, delta):
+        if random.random() < 0.5:
+            img = transforms.ColorJitter(hue=delta)(img)
+        return img
+
+    img = brightness(img, brightness_delta)
+    if random.random() < 0.5:
+        img = contrast(img, contrast_delta)
+        img = saturation(img, saturation_delta)
+        img = hue(img, hue_delta)
+    else:
+        img = saturation(img, saturation_delta)
+        img = hue(img, hue_delta)
+        img = contrast(img, contrast_delta)
+    return img
@@ -0,0 +1,28 @@
+import torch
+import random
+
+from PIL import Image
+
+
+def random_flip(img, boxes):
+    '''Randomly flip PIL image.
+
+    If boxes is not None, flip boxes accordingly.
+
+    Args:
+      img: (PIL.Image) image to be flipped.
+      boxes: (tensor) object boxes, sized [#obj,4].
+
+    Returns:
+      img: (PIL.Image) randomly flipped image.
+      boxes: (tensor) randomly flipped boxes.
+    '''
+    if random.random() < 0.5:
+        img = img.transpose(Image.FLIP_LEFT_RIGHT)
+        w = img.width
+        if boxes is not None:
+            xmin = w - boxes[:,2]
+            xmax = w - boxes[:,0]
+            boxes[:,0] = xmin
+            boxes[:,2] = xmax
+    return img, boxes
@@ -0,0 +1,33 @@
+import torch
+import random
+
+from PIL import Image
+
+
+def random_paste(img, boxes, max_ratio=4, fill=0):
+    '''Randomly paste the input image on a larger canvas.
+
+    If boxes is not None, adjust boxes accordingly.
+
+    Args:
+      img: (PIL.Image) image to be flipped.
+      boxes: (tensor) object boxes, sized [#obj,4].
+      max_ratio: (int) maximum ratio of expansion.
+      fill: (tuple) the RGB value to fill the canvas.
+
+    Returns:
+      canvas: (PIL.Image) canvas with image pasted.
+      boxes: (tensor) adjusted object boxes.
+    '''
+    w, h = img.size
+    ratio = random.uniform(1, max_ratio)
+    ow, oh = int(w*ratio), int(h*ratio)
+    canvas = Image.new('RGB', (ow,oh), fill)
+
+    x = random.randint(0, ow - w)
+    y = random.randint(0, oh - h)
+    canvas.paste(img, (x,y))
+
+    if boxes is not None:
+        boxes = boxes + torch.tensor([x,y,x,y], dtype=torch.float)
+    return canvas, boxes
@@ -0,0 +1,60 @@
+import torch
+import random
+
+from PIL import Image
+
+
+def resize(img, boxes, size, max_size=1000, scale=None, random_interpolation=False):
+    '''Resize the input PIL image to given size.
+
+    If boxes is not None, resize boxes accordingly.
+
+    Args:
+      img: (PIL.Image) image to be resized.
+      boxes: (tensor) object boxes, sized [#obj,4].
+      size: (tuple or int)
+        - if is tuple, resize image to the size.
+        - if is int, resize the shorter side to the size while maintaining the aspect ratio.
+      max_size: (int) when size is int, limit the image longer size to max_size.
+                This is essential to limit the usage of GPU memory.
+      random_interpolation: (bool) randomly choose a resize interpolation method.
+
+    Returns:
+      img: (PIL.Image) resized image.
+      boxes: (tensor) resized boxes.
+
+    Example:
+    >> img, boxes = resize(img, boxes, 600)  # resize shorter side to 600
+    >> img, boxes = resize(img, boxes, (500,600))  # resize image size to (500,600)
+    >> img, _ = resize(img, None, (500,600))  # resize image only
+    '''
+    w, h = img.size
+    if scale is None:
+        if isinstance(size, int):
+            size_min = min(w,h)
+            size_max = max(w,h)
+            sw = sh = float(size) / size_min
+            if sw * size_max > max_size:
+                sw = sh = float(max_size) / size_max
+            ow = int(w * sw + 0.5)
+            oh = int(h * sh + 0.5)
+        else:
+            ow, oh = size
+            sw = float(ow) / w
+            sh = float(oh) / h
+    else:
+        ow = int(w * scale)
+        oh = int(h * scale)
+        sw, sh = scale, scale
+
+    method = random.choice([
+        Image.BOX,
+        Image.NEAREST,
+        Image.HAMMING,
+        Image.BICUBIC,
+        Image.LANCZOS,
+        Image.BILINEAR]) if random_interpolation else Image.BILINEAR
+    img = img.resize((ow,oh), method)
+    if boxes is not None:
+        boxes = boxes * torch.tensor([sw,sh,sw,sh])
+    return img, boxes
@@ -0,0 +1,36 @@
+import torch
+import random
+
+from PIL import Image
+
+
+def scale_jitter(img, boxes, sizes, max_size=1400):
+    '''Randomly scale image shorter side to one of the sizes.
+
+    If boxes is not None, resize boxes accordingly.
+
+    Args:
+      img: (PIL.Image) image to be resized.
+      boxes: (tensor) object boxes, sized [#obj,4].
+      sizes: (tuple) scale sizes.
+      max_size: (int) limit the image longer size to max_size.
+
+    Returns:
+      img: (PIL.Image) resized image.
+      boxes: (tensor) resized boxes.
+    '''
+    w, h = img.size
+    size_min = min(w,h)
+    size_max = max(w,h)
+    size = random.choice(sizes)
+    sw = sh = float(size) / size_min
+    if sw * size_max > max_size:
+        sw = sh = float(max_size) / size_max
+
+    ow = int(w * sw + 0.5)
+    oh = int(h * sh + 0.5)
+    img = img.resize((ow,oh), Image.BILINEAR)
+
+    if boxes is not None:
+        boxes = boxes * torch.tensor([sw,sh,sw,sh])
+    return img, boxes
@@ -0,0 +1,27 @@
+import math
+import torch
+import random
+
+from PIL import Image
+from ..box import box_iou, box_clamp
+
+
+def crop_box_lm(img, boxes, labels, linemap, box):
+    x, y, x2, y2 = box
+    w = x2 - x
+    h = y2 - y
+    img = img.crop((x, y, x2, y2))
+    linemap = linemap.crop((x, y, x2, y2))
+
+    # check if center is still inside tile_box, otherwise ignore box
+    # (if center is not inside tile box, not possible to get IoU >= 0.5 --> treated as background anyways)
+    center = (boxes[:, :2] + boxes[:, 2:]) / 2
+    mask = (center[:, 0] >= x) & (center[:, 0] <= x2) & (center[:, 1] >= y) & (center[:, 1] <= y2)
+    if mask.any():
+        boxes = boxes[mask] - torch.tensor([x, y, x, y], dtype=torch.float)
+        boxes = box_clamp(boxes, 0, 0, w, h)
+        labels = labels[mask]
+    else:
+        boxes = torch.tensor([[0, 0, 0, 0]], dtype=torch.float)
+        labels = torch.tensor([0], dtype=torch.long)
+    return img, boxes, labels, linemap
@@ -0,0 +1,26 @@
+import torch
+import random
+
+from PIL import Image
+
+
+def pad_lm(img, linemap, target_size):
+    '''Pad image with zeros to the specified size.
+
+    Args:
+      img: (PIL.Image) image to be padded.
+      target_size: (tuple) target size of (ow,oh).
+
+    Returns:
+      img: (PIL.Image) padded image.
+
+    Reference:
+      `tf.image.pad_to_bounding_box`
+    '''
+    w, h = img.size
+    canvas = Image.new('L', target_size)
+    canvas.paste(img, (0, 0))  # paste on the left-up corner
+
+    canvas_line = Image.new('1', target_size)
+    canvas_line.paste(linemap, (0, 0))  # paste on the left-up corner
+    return canvas, canvas_line
@@ -0,0 +1,56 @@
+'''This random crop strategy is described in paper:
+   [1] SSD: Single Shot MultiBox Detector
+'''
+import math
+import torch
+import random
+
+from PIL import Image
+# from torchcv.utils.box import box_iou, box_clamp
+from ..box import box_iou, box_clamp
+
+
+def random_crop_tile_lm(
+        img, boxes, labels, linemap,
+        scale_range=[0.8, 1],
+        max_aspect_ratio=2.):
+    '''Randomly crop a PIL image.
+
+    Args:
+      img: (PIL.Image) image.
+      boxes: (tensor) bounding boxes, sized [#obj, 4].
+      labels: (tensor) bounding box labels, sized [#obj,].
+      scale_range: [float,float] minimal image width/height scale.
+      max_aspect_ratio: (float) maximum width/height aspect ratio.
+
+    Returns:
+      img: (PIL.Image) cropped image.
+      boxes: (tensor) object boxes.
+      labels: (tensor) object labels.
+    '''
+    imw, imh = img.size
+
+    scale = random.uniform(scale_range[0], scale_range[1])
+    aspect_ratio = random.uniform(
+        max(1 / max_aspect_ratio, scale * scale),
+        min(max_aspect_ratio, 1 / (scale * scale)))
+    w = int(imw * scale * math.sqrt(aspect_ratio))
+    h = int(imh * scale / math.sqrt(aspect_ratio))
+
+    x = random.randrange(imw - w)
+    y = random.randrange(imh - h)
+
+    img = img.crop((x, y, x + w, y + h))
+    linemap = linemap.crop((x, y, x + w, y + h))
+
+    center = (boxes[:, :2] + boxes[:, 2:]) / 2
+    mask = (center[:, 0] >= x) & (center[:, 0] <= x + w) \
+           & (center[:, 1] >= y) & (center[:, 1] <= y + h)
+    if mask.any():
+        boxes = boxes[mask] - torch.tensor([x, y, x, y], dtype=torch.float)
+        boxes = box_clamp(boxes, 0, 0, w, h)
+        labels = labels[mask]
+    else:
+        boxes = torch.tensor([[0, 0, 0, 0]], dtype=torch.float)
+        labels = torch.tensor([0], dtype=torch.long)
+    return img, boxes, labels, linemap
@@ -0,0 +1,62 @@
+import torch
+import random
+
+from PIL import Image
+
+
+def resize_lm(img, boxes, linemap, size, max_size=1000, scale=None, random_interpolation=False):
+    '''Resize the input PIL image to given size.
+
+    If boxes is not None, resize boxes accordingly.
+
+    Args:
+      img: (PIL.Image) image to be resized.
+      boxes: (tensor) object boxes, sized [#obj,4].
+      size: (tuple or int)
+        - if is tuple, resize image to the size.
+        - if is int, resize the shorter side to the size while maintaining the aspect ratio.
+      max_size: (int) when size is int, limit the image longer size to max_size.
+                This is essential to limit the usage of GPU memory.
+      random_interpolation: (bool) randomly choose a resize interpolation method.
+
+    Returns:
+      img: (PIL.Image) resized image.
+      boxes: (tensor) resized boxes.
+
+    Example:
+    >> img, boxes = resize(img, boxes, 600)  # resize shorter side to 600
+    >> img, boxes = resize(img, boxes, (500,600))  # resize image size to (500,600)
+    >> img, _ = resize(img, None, (500,600))  # resize image only
+    '''
+    w, h = img.size
+    if scale is None:
+        if isinstance(size, int):
+            size_min = min(w, h)
+            size_max = max(w, h)
+            sw = sh = float(size) / size_min
+            if sw * size_max > max_size:
+                sw = sh = float(max_size) / size_max
+            ow = int(w * sw + 0.5)
+            oh = int(h * sh + 0.5)
+        else:
+            ow, oh = size
+            sw = float(ow) / w
+            sh = float(oh) / h
+    else:
+        ow = int(w * scale)
+        oh = int(h * scale)
+        sw, sh = scale, scale
+
+    method = random.choice([
+        Image.BOX,
+        Image.NEAREST,
+        Image.HAMMING,
+        Image.BICUBIC,
+        Image.LANCZOS,
+        Image.BILINEAR]) if random_interpolation else Image.BILINEAR
+    img = img.resize((ow, oh), method)
+    linemap = linemap.resize((ow, oh), Image.NEAREST)
+
+    if boxes is not None:
+        boxes = boxes * torch.tensor([sw, sh, sw, sh])
+    return img, boxes, linemap
@@ -0,0 +1,340 @@
+import numbers
+import numpy as np
+from PIL import Image
+from PIL import ImageOps
+import random
+from random import randint
+
+import torch.functional as F
+# from skimage.transform import warp, AffineTransform
+
+from bbox_utils import intersection_over_union
+
+
+# own stuff
+
+def convert2binaryPIL(lbl_ind):
+    # convert to PIL binary '1' without dither
+    lbl_im = Image.fromarray(np.uint8(lbl_ind))
+    fn = (lambda x: 255 if x > 0 else 0)
+    lbl_im = lbl_im.convert('L').point(fn, mode='1')
+    return lbl_im
+
+def pad2square(bb, context_pad_ratio=0, context_pad=0, take_long_side=True):
+    # -- extract square patches using ground truth bounding boxes
+    # assert (context_pad >= 0 and context_pad_ratio == 0) or (context_pad_ratio >= 0 and context_pad == 0)
+    width = bb[2] - bb[0]
+    height = bb[3] - bb[1]
+    diff = width - height
+    width_is_smaller = 0 > diff
+    height_is_smaller = 0 < diff
+    if take_long_side:
+        # take long side
+        if context_pad == 0:
+            if width_is_smaller:
+                context_pad = np.round(context_pad_ratio * height)
+            else:
+                context_pad = np.round(context_pad_ratio * width)
+        bb[0] = bb[0] - context_pad - (width_is_smaller * np.ceil(0.5 * (height - width)))
+        bb[2] = bb[2] + context_pad + (width_is_smaller * np.floor(0.5 * (height - width)))
+        bb[1] = bb[1] - context_pad - (height_is_smaller * np.ceil(0.5 * (width - height)))
+        bb[3] = bb[3] + context_pad + (height_is_smaller * np.floor(0.5 * (width - height)))
+    else:
+        # take small side
+        if context_pad == 0:
+            if width_is_smaller:
+                context_pad = np.round(context_pad_ratio * width)
+            else:
+                context_pad = np.round(context_pad_ratio * height)
+        bb[0] = bb[0] - context_pad - (height_is_smaller * np.ceil(0.5 * (height - width)))
+        bb[2] = bb[2] + context_pad + (height_is_smaller * np.floor(0.5 * (height - width)))
+        bb[1] = bb[1] - context_pad - (width_is_smaller * np.ceil(0.5 * (width - height)))
+        bb[3] = bb[3] + context_pad + (width_is_smaller * np.floor(0.5 * (width - height)))
+
+    return bb
+
+
+# BBOX sampling / cropping functions
+
+def crop_image(im, bb, context_pad=0, pad_to_square=False, mean_values=[0, 0, 0]):
+    """
+    Crop a window from the image for detection. Include surrounding context
+    according to the `context_pad` configuration. Creates square crop which
+    respects the aspect ratio.
+
+    window: bounding box coordinates as xmin, ymin, xmax, ymax.
+    """
+
+    # copy list and use as ndarray
+    bb = np.array(bb, dtype=int)  # list(bb)
+
+    imw, imh = im.shape[:2]
+
+    # pad to square while preserving aspect ratio
+    if pad_to_square:
+        bb = pad2square(bb, context_pad=context_pad)
+
+        # -- check whether bbox inside image
+        # pad: [x_min, y_min, x_max, y_max]
+
+        pad = [0, 0, 0, 0]
+        if (bb[0] < 0):
+            pad[0] = abs(bb[0])
+            bb[0] = 0
+        if (bb[1] < 0):
+            pad[1] = abs(bb[1])
+            bb[1] = 0
+        if (bb[2] > imh):
+            pad[2] = bb[2] - imh
+            bb[2] = imh
+        if (bb[3] > imw):
+            pad[3] = bb[3] - imw
+            bb[3] = imw
+
+        # -- apply zero padding if necessary
+        im = im[bb[1]:bb[3], bb[0]:bb[2], :]
+
+        channel_mean = np.reshape(mean_values, (1, 1, 3)).astype(np.uint8)
+        if pad[0]>0:
+            pad_left = np.multiply(np.ones(shape=(imw, pad[0], 3), dtype=np.uint8),
+                                   np.tile(channel_mean,(imw, pad[0],1)))
+            im = np.concatenate((pad_left, im), axis=1)
+        if pad[1]>0:
+            pad_up = np.multiply(np.ones(shape=(pad[1], imh, 3), dtype=np.uint8),
+                                 np.tile(channel_mean, (pad[1], imh, 1)))
+            im = np.concatenate((pad_up, im), axis=0)
+        if pad[2]>0:
+            pad_right = np.multiply(np.ones(shape=(imw, pad[2], 3), dtype=np.uint8),
+                                    np.tile(channel_mean, (imw, pad[2], 1)))
+            im = np.concatenate((im, pad_right), axis=1)
+        if pad[3]>0:
+            pad_down = np.multiply(np.ones(shape=(pad[3], imh, 3), dtype=np.uint8),
+                                   np.tile(channel_mean, (pad[3], imh, 1)))
+            im = np.concatenate((im, pad_down), axis=0)
+
+        return im, bb.tolist()
+    else:
+        if context_pad > 0:
+            # better use crop_pil_image
+            return NotImplemented
+        # return simple crop
+        return im[bb[1]:bb[3], bb[0]:bb[2], :]
+
+
+def crop_pil_image(im, bb, context_pad=0, pad_to_square=False, fill_values=None):
+    """
+    Crop a window from the image for detection. Include surrounding context
+    according to the `context_pad` configuration. Creates square crop which
+    respects the aspect ratio.
+
+    window: bounding box coordinates as xmin, ymin, xmax, ymax.
+    """
+
+    # copy list and use as ndarray
+    bb = np.array(bb, dtype=int)  # list(bb)
+
+    imw, imh = im.size
+
+    # pad to square while preserving aspect ratio
+    if pad_to_square:
+        bb = pad2square(bb, context_pad=context_pad)
+
+        if fill_values is None:
+
+            # if cropped out of image range, pillow pads with zeros automatically
+            im = im.crop((bb[0], bb[1], bb[2], bb[3]))
+
+        else:
+            # check whether bbox inside image
+            # pad: [x_min, y_min, x_max, y_max]
+            pad = [0, 0, 0, 0]
+            if bb[0] < 0:
+                pad[0] = abs(bb[0])
+                bb[0] = 0
+            if bb[1] < 0:
+                pad[1] = abs(bb[1])
+                bb[1] = 0
+            if bb[2] > imh:
+                pad[2] = bb[2] - imh
+                bb[2] = imh
+            if bb[3] > imw:
+                pad[3] = bb[3] - imw
+                bb[3] = imw
+
+            # crop box
+            im = im.crop((bb[0], bb[1], bb[2], bb[3]))
+            # apply zero padding if necessary
+            im = ImageOps.expand(im, border=(pad[0], pad[1], pad[2], pad[3]), fill=tuple(fill_values))
+
+        return im, bb.tolist()
+    else:
+        if context_pad > 0:
+            bb[0] = max(bb[0] - context_pad, 0)
+            bb[2] = min(bb[2] + context_pad, imw)
+            bb[1] = max(bb[1] - context_pad, 0)
+            bb[3] = min(bb[3] + context_pad, imh)
+        # return simple crop
+        return im.crop((bb[0], bb[1], bb[2], bb[3])), bb.tolist()
+
+
+def spatial_sample(im_pad, bb, spatial_sample_rng, rnd_scale_ratio=0.05):
+    im = im_pad
+    imh, imw = im.shape[:2]
+    im_bb = [0, 0, imw, imh]
+
+    # make ground truth box square, and use its dimensions
+    bb_gt = list(bb)
+    w = bb[2] - bb[0]
+    h = bb[3] - bb[1]
+    if w > h:
+        bb_gt[1] = int(bb_gt[1] - np.ceil(0.5 * (w - h)))
+        bb_gt[3] = int(bb_gt[3] + np.floor(0.5 * (w - h)))
+        h = w
+    else:
+        bb_gt[0] = int(bb_gt[0] - np.ceil(0.5 * (h - w)))
+        bb_gt[2] = int(bb_gt[2] + np.floor(0.5 * (h - w)))
+        w = h
+    # add random scaling to test bbox
+    # by treating dimension differently the aspect ratio will fluctuate a little (due to resizing afterwards!)
+    wrange = round(rnd_scale_ratio * w)
+    hrange = round(rnd_scale_ratio * h)
+    w = min(w + random.randint(-wrange, 2*wrange), imw - 1)  # ensure size is in im_pad
+    h = w  # min(h + random.randint(hrange, 2*hrange), imh - 1)  # ensure size is in im_pad
+
+    # set ranges according to provided label
+    min_IoU = spatial_sample_rng[0]
+    max_IoU = spatial_sample_rng[1]
+
+    max_iter = 500
+    curr_iter = 0
+    ratio = 0.0
+    while curr_iter < max_iter and (ratio >= max_IoU or ratio <= min_IoU):
+        curr_iter += 1
+
+        # bbox sampling
+        jxy = [randint(0, im_bb[2] - w), randint(0, im_bb[3] - h)]
+        bb_test = list([jxy[0], jxy[1], w + jxy[0], h + jxy[1]])
+
+        # check if new box fits criteria
+        if min(bb_test) >= 0 and bb_test[2] <= im.shape[1] and bb_test[3] <= im.shape[0]:
+            ratio = intersection_over_union(bb_test, bb_gt)
+
+    if max_IoU >= ratio >= min_IoU:
+        im = im[bb_test[1]:bb_test[3], bb_test[0]:bb_test[2], :]
+        # new_bb_gt = [bb_gt[0] - bb_test[0], bb_gt[1] - bb_test[1],  bb_gt[2] - bb_test[0], bb_gt[3] - bb_test[1]]
+        new_bb_gt = bb_gt
+    else:
+        im = im
+        new_bb_gt = bb_gt
+
+        # DEBUG_MODE = False
+        # if DEBUG_MODE:
+        #     print "tricky box", w, h, imw, imh
+
+    return im, new_bb_gt, bb_test
+
+
+
+# TRANSFORMS
+
+
+class MyRandomZoom(object):
+    def __init__(self, scale_range, interpolation=Image.BILINEAR):
+        self.scale_range = scale_range
+        self.interpolation = interpolation
+
+    def __call__(self, img):
+        scale = np.random.uniform(*self.scale_range)
+        new_size = (int(img.height * scale), int(img.width * scale))
+        return F.resize(img, new_size, self.interpolation)
+
+
+class MyFuzzyZoom(object):
+    """
+    :param target_size: (2-tuple) height, width
+    :param scale_range: (2-tuple) range from which target_size may deviate
+    :param interpolation: ({PIL.Image.NEAREST, PIL.Image.BILINEAR, PIL.Image.BICUBIC}, optional)
+    """
+    def __init__(self, target_size, scale_range, interpolation=Image.BILINEAR):
+
+        self.target_size = target_size
+        self.scale_range = scale_range
+        self.interpolation = interpolation
+
+    @staticmethod
+    def get_params(scale_range):
+        return np.random.uniform(*scale_range)
+
+    def __call__(self, img):
+        scale = self.get_params(self.scale_range)
+        new_size = (int(self.target_size[0] * scale), int(self.target_size[1] * scale))
+        return F.resize(img, new_size, self.interpolation)
+
+
+class MyRandomChoiceZoom(object):
+    def __init__(self, scales, p=None, interpolation=Image.BILINEAR):
+        self.scales = scales
+        self.interpolation = interpolation
+        self.p = p
+
+    def __call__(self, img):
+        scale = np.random.choice(self.scales, replace=True, p=self.p)
+        new_size = (int(img.height * scale), int(img.width * scale))
+        return F.resize(img, new_size, self.interpolation)
+
+
+class MyRandomCenteredRotation(object):
+    """
+    Args:
+        degrees (sequence or float or int): Range of degrees to select from.
+            If degrees is a number instead of sequence like (min, max), the range of degrees
+            will be (-degrees, +degrees).
+        translation_range (2-tuple): Range of pixels to select from.
+            The center of rotation is shifted according to a number sampled from this range.
+        resample ({PIL.Image.NEAREST, PIL.Image.BILINEAR, PIL.Image.BICUBIC}, optional):
+            An optional resampling filter.
+            See http://pillow.readthedocs.io/en/3.4.x/handbook/concepts.html#filters
+            If omitted, or if the image has mode "1" or "P", it is set to PIL.Image.NEAREST.
+    """
+    def __init__(self, degrees, translation_range=(-3, 3), resample=Image.BILINEAR):
+        if isinstance(degrees, numbers.Number):
+            if degrees < 0:
+                raise ValueError("If degrees is a single number, it must be positive.")
+            self.degrees = (-degrees, degrees)
+        else:
+            if len(degrees) != 2:
+                raise ValueError("If degrees is a sequence, it must be of len 2.")
+            self.degrees = degrees
+        self.translation_range = translation_range
+        self.resample = resample
+
+    def __call__(self, img):
+
+        angle = np.random.uniform(*self.degrees)
+        translated_center = None
+        if self.translation_range:
+            translated_center = (
+                np.random.uniform(*self.translation_range) + int(img.height/2),
+                np.random.uniform(*self.translation_range) + int(img.width/2)
+            )
+        return F.rotate(img, angle, resample=self.resample, expand=False, center=translated_center)
+
+
+class UnNormalize(object):
+    def __init__(self, mean, std):
+        self.mean = mean
+        self.std = std
+
+    def __call__(self, tensor):
+        """
+        Args:
+            tensor (Tensor): Tensor image of size (C, H, W) to be normalized.
+        Returns:
+            Tensor: Normalized image.
+        """
+        for t, m, s in zip(tensor, self.mean, self.std):
+            t.mul_(s).add_(m)
+            # The normalize code -> t.sub_(m).div_(s)
+        return tensor
+
+
@@ -0,0 +1,62 @@
+import numpy as np
+import matplotlib.pyplot as plt
+from scipy.spatial.distance import pdist, cdist, squareform
+
+
+def show_lines_tl_alignment(lbl_ind_x, center_im, line_hypos, color):
+    # Generating figure 1
+    fig, axes = plt.subplots(1, 2, figsize=(15, 6),   # (25, 10)
+                             subplot_kw={'adjustable': 'box-forced'})
+    ax = axes.ravel()
+
+    ax[0].imshow(center_im, cmap='gray')
+    ax[0].set_title('Input image')
+    ax[0].set_axis_off()
+
+    ax[1].imshow(lbl_ind_x, cmap='gray')
+
+    for idx, line_rec in line_hypos.groupby('label').mean().iterrows():
+        angle = line_rec.angle
+        dist = line_rec.dist
+        y0 = (dist - 0 * np.cos(angle)) / np.sin(angle)
+        y1 = (dist - lbl_ind_x.shape[1] * np.cos(angle)) / np.sin(angle)
+        ax[1].plot((0, lbl_ind_x.shape[1]), (y0, y1), '-', color=color[int(idx)], linewidth=2)
+        ax[1].text(0, y0, '{}'.format(int(line_rec.tl_line)),
+                   bbox=dict(facecolor='blue', alpha=0.5), fontsize=8, color='white')
+
+    ax[1].set_xlim((0, lbl_ind_x.shape[1]))
+    ax[1].set_ylim((lbl_ind_x.shape[0], 0))
+    ax[1].set_axis_off()
+    ax[1].set_title('Detected lines / Assigned tl line idx')
+
+
+def show_score_mats_with_paths(assigned_tl_indices, hypo_line_indices, tl_line_indices, line_frag):
+    # Generating figure 1
+    fig, axes = plt.subplots(1, 3, figsize=(15, 6),
+                             subplot_kw={'adjustable': 'box-forced'})
+    ax = axes.ravel()
+
+    # weak score
+    X_dist = cdist(assigned_tl_indices.reshape(-1, 1), assigned_tl_indices.reshape(-1, 1),
+                   lambda a_idx, b_idx: line_frag.compute_weak_score(a_idx.squeeze(), b_idx.squeeze()))
+    ax[0].imshow(X_dist, cmap='gray')
+    ax[0].set_title('weak score')
+    print(np.diag(X_dist))
+
+    # ransac score
+    # X_dist = cdist(assigned_tl_indices.reshape(-1, 1), assigned_tl_indices.reshape(-1, 1),
+    X_dist = cdist(hypo_line_indices.reshape(-1, 1), tl_line_indices.reshape(-1, 1),
+                   lambda a_idx, b_idx: line_frag.compute_ransac_score(a_idx.squeeze(), b_idx.squeeze(),
+                                                                       max_dist_thresh=2, dist_weight=1))  # 5/5, 4/1
+    ax[1].imshow(X_dist, cmap='gray_r')
+    ax[1].set_title('ransac score')
+    print(np.diag(X_dist))
+
+    # line matching score
+    X_dist = cdist(hypo_line_indices.reshape(-1, 1), tl_line_indices.reshape(-1, 1),
+                   lambda a_idx, b_idx: line_frag.compute_line_matching_score(a_idx.squeeze(), b_idx.squeeze()))
+    ax[2].imshow(X_dist, cmap='gray_r')  # vmin=0, vmax=1
+    ax[2].set_title('line matching score')
+    print(np.diag(X_dist))
+
+
@@ -0,0 +1,100 @@
+import numpy as np
+import matplotlib.pyplot as plt
+from scipy import ndimage as ndi
+
+
+def show_line_skeleton(lbl_ind_x, skeleton):
+    # display results
+    fig, axes = plt.subplots(nrows=1, ncols=3, figsize=(12, 4), sharex=True, sharey=True,
+                             subplot_kw={'adjustable': 'box-forced'})
+    ax = axes.ravel()
+
+    ax[0].imshow(lbl_ind_x, cmap=plt.cm.gray)
+    ax[0].axis('off')
+    ax[0].set_title('original', fontsize=20)
+
+    ax[1].imshow(skeleton, cmap=plt.cm.gray)
+    ax[1].axis('off')
+    ax[1].set_title('skeleton', fontsize=20)
+
+    ax[2].imshow(ndi.label(skeleton, structure=np.ones((3, 3)))[0], cmap=plt.cm.spectral)
+    ax[2].axis('off')
+    ax[2].set_title('skeleton', fontsize=20)
+
+    fig.tight_layout()
+
+
+def show_hough_transform_w_lines(lbl_ind_x, center_im, h, theta, d, line_hypos, color):
+    # Generating figure 1
+    fig, axes = plt.subplots(1, 3, figsize=(15, 6),
+                             subplot_kw={'adjustable': 'box-forced'})  # (25, 15)
+    ax = axes.ravel()
+
+    ax[0].imshow(center_im, cmap='gray')
+    ax[0].set_title('Input image')
+    ax[0].set_axis_off()
+
+    ax[1].imshow(np.log(1 + h),
+                 extent=[np.rad2deg(theta[-1]), np.rad2deg(theta[0]), d[-1], d[0]],
+                 cmap='gray', aspect=1 / 1.5)
+    ax[1].set_title('Hough transform')
+    ax[1].set_xlabel('Angles (degrees)')
+    ax[1].set_ylabel('Distance (pixels)')
+    ax[1].axis('image')
+
+    ax[2].imshow(lbl_ind_x, cmap='gray')
+
+    for idx, line_rec in line_hypos.groupby('label').mean().iterrows():
+        angle = line_rec.angle
+        dist = line_rec.dist
+        y0 = (dist - 0 * np.cos(angle)) / np.sin(angle)
+        y1 = (dist - lbl_ind_x.shape[1] * np.cos(angle)) / np.sin(angle)
+        ax[2].plot((0, lbl_ind_x.shape[1]), (y0, y1), '-', color=color[int(idx)], linewidth=2)
+
+    ax[2].set_xlim((0, lbl_ind_x.shape[1]))
+    ax[2].set_ylim((lbl_ind_x.shape[0], 0))
+    ax[2].set_axis_off()
+    ax[2].set_title('Detected lines')
+
+    # ax[2].imshow(lbl_ind, cmap='gray')
+    # ax[2].set_title('Input image')
+    # ax[2].set_axis_off()
+
+
+def show_probabilistic_hough(lbl_ind_x, center_im, line_segs, ls_labels, group2line, color):
+    fig, axes = plt.subplots(1, 2, figsize=(15, 5))
+    ax = axes.ravel()
+
+    ax[0].imshow(center_im, cmap='gray')
+    ax[0].set_title('Input image')
+
+    # ax[1].imshow(lbl_ind_x, cmap='gray')
+    # ax[1].set_title('line det')
+
+    ax[1].imshow(lbl_ind_x * 0)
+    for line, li in zip(line_segs, ls_labels):
+        p0, p1 = line
+        ax[1].plot((p0[0], p1[0]), (p0[1], p1[1]), color=color[int(group2line[li])], linewidth=2)
+        ax[1].text(p0[0], p0[1], '{}'.format(group2line[li]),
+                   bbox=dict(facecolor='blue', alpha=0.5), fontsize=8, color='white')
+    ax[1].set_xlim((0, lbl_ind_x.shape[1]))
+    ax[1].set_ylim((lbl_ind_x.shape[0], 0))
+    ax[1].set_title('Probabilistic Hough')
+
+
+def show_line_segms(image_label_overlay, segm_labels):
+    fig, axes = plt.subplots(1, 2, figsize=(15, 9))  # 25, 15
+    ax = axes.ravel()
+
+    ax[0].imshow(image_label_overlay, cmap='gray')
+    ax[0].set_title('Input image')
+
+    ax[1].imshow(segm_labels)
+    ax[1].set_title('Line segments')
+
+
+
+
+
+
+
--- a/Mostrar Mais
+++ b/Mostrar Mais
				`@@ -0,0 +1 @@`
				[0, 2, 0, 191, 0, 184, 196, 238, 239, 40, 26, 221, 240, 241, 24, 109, 73, 236, 210, 0, 205, 0, 242, 0, 58, 0, 133, 0, 0, 243, 0, 199, 244, 0, 0, 0, 0, 245, 0, 0, 0, 0, 0, 0, 246, 0, 0, 0, 0, 247, 0, 0, 0, 0, 0, 0, 0, 248, 0, 0, 0, 249, 0, 0, 250, 208, 0, 0, 0, 0, 0, 193, 0, 251, 0, 252, 0, 0, 0, 162, 154, 0, 0, 0, 66, 23, 48, 0, 0, 108, 228, 129, 140, 0, 0, 0, 0, 253, 41, 97, 0, 254, 0, 0, 0, 255, 256, 0, 257, 258, 4, 8, 17, 31, 0, 202, 0, 224, 83, 213, 49, 259, 260, 0, 0, 0, 0, 20, 0, 175, 207, 261, 90, 0, 6, 0, 156, 93, 189, 152, 110, 7, 76, 64, 0, 0, 0, 0, 145, 262, 263, 194, 264, 265, 0, 0, 0, 266, 0, 0, 267, 0, 29, 0, 78, 268, 142, 231, 269, 0, 63, 0, 89, 270, 271, 272, 34, 273, 186, 0, 101, 107, 274, 275, 104, 0, 0, 0, 276, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 277, 0, 278, 0, 0, 0, 69, 0, 279, 0, 0, 280, 0, 0, 281, 0, 0, 0, 0, 0, 176, 215, 116, 0, 0, 0, 0, 0, 0, 282, 0, 283, 0, 0, 0, 284, 0, 157, 0, 0, 0, 61, 0, 0, 0, 232, 206, 5, 0, 0, 0, 80, 124, 222, 36, 0, 0, 183, 195, 84, 160, 237, 0, 0, 0, 119, 0, 0, 0, 285, 71, 0, 0, 0, 229, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 182, 0, 0, 132, 286, 0, 0, 287, 115, 52, 0, 197, 233, 209, 0, 0, 0, 0, 0, 0, 200, 0, 204, 288, 46, 0, 0, 289, 0, 0, 0, 0, 0, 0, 0, 290, 0, 50, 0, 0, 0, 0, 0, 291, 0, 0, 0, 292, 0, 293, 192, 294, 295, 0, 0, 0, 0, 0, 0, 296, 0, 25, 120, 297, 212, 123, 0, 146, 134, 21, 0, 0, 0, 70, 0, 0, 0, 0, 0, 0, 0, 0, 0, 298, 299, 0, 0, 0, 300, 141, 44, 62, 45, 0, 0, 0, 0, 0, 138, 0, 0, 0, 0, 301, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 302, 0, 0, 118, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 303, 0, 0, 0, 304, 305, 0, 0, 306, 0, 165, 307, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 308, 0, 0, 0, 113, 309, 137, 0, 0, 28, 0, 0, 310, 0, 159, 0, 0, 0, 0, 0, 0, 0, 0, 181, 99, 158, 311, 0, 0, 0, 30, 102, 0, 74, 177, 3, 126, 312, 19, 67, 188, 130, 128, 313, 178, 0, 0, 163, 0, 314, 0, 42, 315, 0, 15, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 143, 0, 0, 0, 0, 225, 72, 0, 139, 223, 0, 0, 316, 57, 317, 318, 319, 13, 35, 320, 321, 217, 322, 179, 190, 121, 65, 150, 0, 323, 324, 148, 96, 0, 0, 226, 325, 0, 326, 327, 0, 219, 328, 98, 43, 87, 0, 0, 234, 329, 112, 0, 60, 0, 18, 198, 136, 330, 0, 0, 331, 39, 0, 155, 27, 0, 92, 0, 0, 0, 0, 0, 0, 332, 0, 0, 333, 105, 0, 334, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 218, 0, 0, 94, 135, 0, 0, 103, 174, 0, 0, 0, 75, 32, 0, 201, 187, 0, 335, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 16, 0, 211, 0, 0, 0, 0, 0, 0, 0, 0, 122, 0, 0, 0, 0, 0, 167, 0, 0, 22, 168, 0, 0, 230, 12, 0, 0, 336, 85, 166, 0, 227, 0, 53, 337, 0, 37, 0, 0, 185, 0, 338, 339, 171, 0, 0, 173, 0, 0, 86, 340, 0, 153, 0, 0, 0, 0, 341, 216, 342, 343, 0, 79, 180, 144, 0, 0, 125, 0, 161, 169, 9, 0, 0, 77, 100, 0, 0, 0, 344, 0, 0, 214, 131, 55, 95, 14, 0, 47, 345, 0, 81, 56, 117, 106, 0, 0, 0, 235, 33, 0, 0, 0, 0, 346, 347, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 348, 0, 349, 0, 0, 0, 0, 0, 0, 350, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 114, 38, 351, 352, 0, 82, 353, 0, 164, 354, 0, 355, 0, 0, 172, 0, 0, 356, 51, 357, 358, 88, 0, 359, 0, 360, 127, 54, 361, 147, 0, 0, 11, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 68, 362, 0, 363, 0, 111, 0, 0, 1, 0, 220, 364, 365, 366, 0, 0, 0, 367, 10, 0, 0, 0, 0, 0, 0, 0, 368, 0, 0, 0, 369, 0, 170, 203, 0, 0, 151, 0, 91, 0, 370, 371, 372, 0, 0, 0, 0, 0, 149, 59, 0, 0, 0, 0, 373, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 374, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]