The pipeline works the same way as before, but with one key difference: the generation model has been fine-tuned on me specifically, using a custom LoRA (Low-Rank Adaptation) trained on a curated set of my own photographs and portraits.
- Visual analysis — An uploaded photo is first read by a vision-language model (Qwen2.5-VL), which produces a detailed description of composition, lighting, and mood.
- Prompt optimization — A second model (Qwen2.5) turns that description into a clean, precise generation prompt.
- Image generation with a trained likeness — Instead of generating a generic image from the prompt, the diffusion model (Z-Image Turbo, via ComfyUI) applies my personal LoRA on top of the base model. This means the system doesn’t just interpret the uploaded photo abstractly — it reconstructs the scene through my own trained likeness, inserting my face, my features, my visual identity into the new image it creates.
- Critique — Finally, a language model — delivers a written verdict on the image, as if reviewing a portrait that is, in a strange way, of me but not by me.
The effect is unsettling by design: an upload of any photograph — mine or someone else’s — comes back reinterpreted with my own face and presence woven in by a machine that has learned what I look like, followed by that same machine condemning the result.



