Imgur
CUHK-03 dataset is a baseline dataset for person re-identification tasks, published on 2014. The original file is a 1.73GB .mat file which compress all images and testsets settings together. Because my project is mainly based on Python and Tensorflow/Keras. Therefore, I need to decompose the original file to .jpg, then retrain them in the GoogLeNet by Tensorflow.

I searched on the Internet before and there is no specific article about how to extract images from the mat by Python. Hope this blog can help the people who are interested in person re-id as well ;).

Dataset Structure Analysis

There are 3 cells in the .mat file: detected,labeled and testsets.
cuhk

detected: bounding box detected by SOTA algorithm.
labeled: manually drawn ground truth bounding box
testsets: 20 random testset settings.

Detected

There structure of detected is:

For example, 843 means the number of identities. 10 is the number of frames. The structure in the 843*10 cell as shown below:
Imgur
There are some empty value in it so we need to ignore them when extract the images. The structure of element in it as shown below:
detected2
It is obvious that each element is a standard image (HeightWidthRGB).

Labeled

The structure of Labeled is same as the detected. Noticed that the images in detected and labeled are nearly same. The only difference is the size of image and it doesn’t matter. Therefore, I just extract the images from the detected.

Testsets

The structure of testsets as shown below:
testset
There are 20 different testset settings in it. Each setting contains 100 random identities (id number < 843) with corresponding camera(id from 1 to 3).

Extract images by Python

To extract .mat, first we need h5py to import it.

import h5py
f=h5py.File("/Users/typewind/Downloads/cuhk03_release/cuhk-03.mat")
variables=f.items()
cuhk03={}
for var in variables:
    name=var[0]
    data=var[1]
    cuhk03[name]=data
print(cuhk03)

Then we get the print output:

{'#refs#': <HDF5 group "/#refs#" (28224 members)>, 'detected': <HDF5 dataset "detected": shape (1, 5), type "|O">, 'labeled': <HDF5 dataset "labeled": shape (1, 5), type "|O">, 'testsets': <HDF5 dataset "testsets": shape (1, 20), type "|O">}

Each data here is a reference(or pointer). Then we can extract the testsets first and write the testsets into .txt file.

for i in range(20):
    testset = f[cuhk03["testsets"][0][i]]
    camera=testset[0]
    identity=testset[1]
    new_camera=[]
    new_identity=[]
    for c in camera:
        buffer_c="0"*(2-len(str(int(c))))+str(int(c))
        new_camera.append(buffer_c)
    for i in identity:
        buffer_i="0"*(4-len(str(int(i))))+str(int(i))
        new_identity.append(buffer_i)
    with open(path_camera, 'a') as file:
        file.write(str(new_camera))
    with open(path_id,'a') as file:
        file.write(str(new_identity))

Then we can extract the images like this:

def get_image(data,path,shift):
    for camera in range(10):
        for frame in range(len(data[0])):
            if len(f[data[camera][frame]])!=2:
                rgb=np.array([f[data[camera][frame]][0],
                          f[data[camera][frame]][1],
                          f[data[camera][frame]][2]])

                #print(camera)
                #print(frame)
                rgb=rgb.transpose(2,1,0)
                img=Image.fromarray(rgb,"RGB")
                img=img.resize((64,128))
                frame_id="0"*(4-len(str(frame+shift)))+str(frame+shift)
                camera_id="0"*(2-len(str(camera)))+str(camera)
                img.save(path+frame_id+"_"+camera_id+".jpg")

The data here means the data in the CUHK-03 dataset. The path is the output path. The shift is the shift of number of identities (they divide the identities into 843, 440, 77, 58, 49). Actually I can merge all of them and extract the images without shift variable. But just in case that if I need to apply my algorithm by this partition.

I resized the images to 128*64 and set the file name as 0000_00.jpg because I want to follow the similar setting of Market-1501 dataset. You can change to what you want.

Therefore, I load the mat data by this partition setting first:

detected_843=f[cuhk03["detected"][0][0]]
detected_440=f[cuhk03["detected"][0][1]]
detected_77=f[cuhk03["detected"][0][2]]
detected_58=f[cuhk03["detected"][0][3]]
detected_49=f[cuhk03["detected"][0][4]]

Then extract and output the images by these lines:

get_image(detected_843,path_detected,0)
get_image(detected_440, path_detected,843)
get_image(detected_77,path_detected,440+843)
get_image(detected_58,path_detected,440+77+843)
get_image(detected_49,path_detected,440+77+843+58)

Here is a sample output:
output

You can view the full code here