Skip to main content

Command Palette

Search for a command to run...

The Tech Behind AWS Community Day Images

How a live GenAI Photo Booth came to life on AWS

Updated
7 min read
The Tech Behind AWS Community Day Images

Participants’ Experience at the GenAI Booth

From the moment attendees approached the booth, curiosity quickly turned into excitement. Most participants expected a simple photo capture, but what followed genuinely surprised them.

After clicking a live photo, they watched their image transform in real time through Generative AI. The turnaround was fast enough to feel almost magical. Seeing their own face accurately preserved while the background, lighting, or style changed instantly left many attendees amazed.

Common reactions we observed:

  • Excitement – People gathered around to see others’ results, creating a buzz at the booth.

  • Surprise – Many didn’t expect such high-quality, realistic outputs within seconds.

  • Amazement – The fact that this was happening live, at an event, without delays or glitches, made the experience memorable.

For many participants, this was their first hands-on interaction with Generative AI, not through slides or demos, but through something personal, their own photo. That personal connection made the technology feel real, approachable, and fun.

What the System Was (High-Level)

Behind the scenes, the system followed a clean, event-ready flow designed for speed, reliability, and scale:

Capture → Secure Upload → AI Edit → Store → Display

At the booth, the interaction felt instant and effortless. A photo was captured and immediately handed off to the cloud, where all the heavy lifting happened invisibly. This deliberate separation ensured that the local device stayed lightweight, responsive, and resilient even during peak crowd moments.

Capture

  • High-resolution images were captured locally with minimal on-device processing

  • The booth hardware focused only on speed and reliability, not compute

  • This avoided overheating, crashes, or performance drops during continuous usage

Secure Upload

  • Images were uploaded instantly over a secure channel

  • Authentication and access control were enforced at the cloud boundary

  • No sensitive data was stored permanently on the booth device

AI Edit

  • Once in the cloud, AI-based image enhancement and face-preserving edits were triggered automatically

  • GPU-intensive workloads ran entirely on scalable infrastructure

  • Processing could scale up or down based on real-time demand without affecting booth performance

Store

  • Edited images were stored in durable cloud storage

  • Each image was uniquely tagged for easy retrieval and tracking

  • This made it simple to reuse images later for emails, galleries, or analytics

Display

  • The final image was streamed back to the booth display within seconds

  • Participants saw their enhanced photo almost instantly, maintaining excitement and engagement

  • The fast feedback loop made the experience feel “magical” despite the complex backend

The key design principle was intentional separation of concerns.

  • The booth handled interaction and capture

  • The cloud handled compute, security, and reliability

By pushing complexity into AWS and keeping the edge device simple, the system remained stable under load, easy to operate at a live event, and ready to scale without redesign. This is what allowed a smooth, high-energy participant experience while running a production-grade AI system behind the scenes.

How it was implemented

Model Choice & Quantization Rationale

For our image editing and face-retention pipeline, we selected Qwen Image Edit as the primary model, paired with 4-bit bitsandbytes (bnb) quantization. This combination offered the best balance between identity preservation, edit precision, and deployment efficiency, also while being all open-source.

Why Qwen Image Edit?

Qwen Image Edit is optimized for instruction-based image editing, making it particularly strong at localized changes while preserving facial identity. Unlike generic text-to-image models, it understands what to change and what to keep, which is critical for tasks like background replacement, lighting correction, or style adjustments without altering facial features.

Its architecture demonstrates:

  • Strong face consistency across edits

  • Better semantic alignment between prompt and output

  • Minimal identity drift compared to diffusion-only approaches

This makes it well-suited for real-world photo booths and personalized image workflows.

Why 4-bit bnb Quantization?

To enable efficient inference on limited GPU resources, we applied 4-bit bnb quantization. This significantly reduces VRAM usage while maintaining perceptual quality, especially important for face-centric edits.

Key benefits:

  • ~70–75% reduction in memory footprint

  • Faster inference with negligible loss in facial detail

  • Practical for on-device or edge deployments

In our testing, face structure and expressions remained intact even under aggressive quantization.

Models Explored (and Why They Fell Short)

We evaluated several popular alternatives:

  • Fooocus – Excellent for aesthetics and prompt adherence, but less reliable for strict face retention across edits.

  • InstantID – Strong identity control, but more rigid and less flexible for general image editing workflows.

  • Stable Diffusion – Highly versatile, but requires heavy tuning and auxiliary models to achieve consistent face preservation.

  • Flux – Promising generation quality, but not yet as stable for instruction-guided editing use cases.

Final Takeaway

By combining Qwen Image Edit with 4-bit bnb quantization, we achieved a system that is:

  • Identity-safe

  • Edit-precise

  • Resource-efficient

This makes it a practical and scalable choice for production-grade face-preserving image editing using Generative AI.

Why bitsandbytes?

bitsandbytes is widely adopted because it makes large language models usable on limited hardware without heavy engineering effort.

The key reasons teams choose bitsandbytes are:

  • Massive VRAM savings
    Loading models in 8-bit or 4-bit precision drastically reduces GPU memory usage, enabling billion-parameter models to run on consumer GPUs.

  • Minimal accuracy loss
    Instead of naïve quantization, bitsandbytes preserves higher precision for sensitive computations, maintaining strong model quality even at low bit-widths.

  • Supports fine-tuning, not just inference
    Unlike many inference-only formats, bitsandbytes allows parameter-efficient fine-tuning (such as LoRA-style adapters) on quantized models.

  • Seamless ecosystem integration
    It works directly with modern transformer pipelines, requiring minimal changes to existing workflows.

  • Hardware-aware optimizations
    bitsandbytes leverages optimized low-level kernels to efficiently run quantized matrix operations on supported accelerators.

In practice, bitsandbytes is often the default choice when you want fast experimentation, fine-tuning, or deployment of large models on constrained GPUs, without committing to a specialized inference-only format.

Why not the others?

While multiple quantization approaches exist, each comes with trade-offs that make them better suited for specific scenarios. bitsandbytes is often preferred when flexibility and ease of use matter most.


GGUF (Why not GGUF here?)

GGUF is optimized for inference-only workflows, especially on CPUs and edge devices.
However:

  • It is not designed for fine-tuning or training

  • Models are typically locked into specific runtimes

  • Less suitable for GPU-centric experimentation workflows

If your goal involves iterating, experimenting, or adapting models, GGUF can feel restrictive compared to bitsandbytes.


AWQ (Why not AWQ here?)

AWQ excels at high-quality 4-bit inference, but:

  • It is primarily inference-focused

  • Requires pre-quantized checkpoints

  • Offers less flexibility for experimentation or training

AWQ shines when deploying a model at scale, but it is less convenient during rapid prototyping or research workflows.


GPTQ and similar methods

Other advanced quantization techniques (such as GPTQ-style methods):

  • Often require custom calibration steps

  • Are more sensitive to hardware and model architecture

  • May introduce friction into standard workflows

They deliver excellent compression, but at the cost of complexity and rigidity.


Why bitsandbytes stands out

bitsandbytes sits in the middle ground:

  • More flexible than inference-only formats

  • Easier to integrate than specialized quantizers

  • Supports both inference and parameter-efficient fine-tuning

  • Ideal for development, experimentation, and small-scale deployment

This makes it a practical default choice when working with large models on limited hardware.

Architecture and Integration

The architecture was intentionally simple, both to explain to a broad audience and to ensure reliability during a live event.

Step by step flow

  • Attendee captures a photo at the booth, by scanning the ticket at the booth

  • Image is securely sent to an API layer

  • API validates and forwards the request

  • Secure networking keeps all traffic protected

  • GPU based EC2 instance runs AI inference

  • Final image is stored temporarily

  • Booth display retrieves and shows the result

This loose coupling between components ensured stability and smooth performance.

How It Was Hosted on AWS

AWS provided the backbone that made the live experience reliable and scalable.

AWS components and roles

  • GPU enabled EC2 instance(g6e.xlarge) for AI inference

  • S3 Object storage for temporary image storage

  • Secure networking for protected data flow

  • Cloud infrastructure that handled continuous usage

Benefits of using AWS:

  • Stable performance during a public event

  • Secure handling of user images

  • Easy path to scale for future events or multiple booths

Community Impact and Takeaways

The GenAI Photo Booth was more than an attraction. It acted as a learning tool for the community.

Impact on the community

  • Made Generative AI tangible and approachable

  • Helped attendees understand real world AI systems

  • Connected cloud infrastructure concepts with live applications

  • Inspired students and developers to experiment

By experiencing Generative AI firsthand, attendees could see how models, quantization, and cloud infrastructure come together in a working system. The booth demonstrated that with the right design and cloud support, advanced AI can be built and showcased even at community driven events.

In the end, the experience reinforced a simple idea. The best way to understand technology is often to interact with it directly.

The source can be found here on Github.