
Video of the week: a guide to organize your workspace
9 November, 2023
Wooden home office workstations
16 November, 2023Do you remember, less than a month ago I published a batch of 4 pictures generated from Ommik website? They were all bad quality and glitchy due to the older version of Dall-E model. This Monday, OpenAI announced the possibility I was waiting for – newest Dall-E-3 model via API was finally accessible. Now, these are the pictures I generated from the Ommik website using the new model. Big difference, no?




Now, what is missing and what challenges am I facing right now?
Even though, technically, OpenAI can detect objects in the image and classify them, it is not possible via API, and, in fact, the API possibilities are quite limited. Currently, the only 3 things Dall-E can do via API are:
- Text to image generation
- Edit images based on provided mask
- Image variations
None of them are advanced enough to make Ommik service running. Therefore, I need other solutions.
Object detection, classification, and segmentation
Dall-E-3 model via API cannot explicitly detect objects in an image. Well, it is doing it sort of in the background when you are generating variations of the image, but you cannot say, hey find a desk in the picture and generate exactly the same picture but only change the desk to another type of desk.
To make it work, you need to create a mask of the original image by erasing the part of the image (in my example, which would be a desk). Then, you provide the API with the 3 things: original image, a mask image and the prompt saying what kind of desk you want in a new image.
To create the mask, I will have to use machine learning capabilities to detect and classify an object (a desk) in the image and erase it (segmentation).
Dall-E-3 model via API is still a powerful tool, but it is still in the early stages of development. Nonetheless, everything is possible, and the technology is ready to make some insane progress.