The Tech Behind AWS Community Day Images
How a live GenAI Photo Booth came to life on AWS

Participants’ Experience at the GenAI Booth
From the moment attendees approached the booth, curiosity quickly turned into excitement. Most participants expected a simple photo capture, but what followed genuinely surprised them.
After clicking a live photo, they watched their image transform in real time through Generative AI. The turnaround was fast enough to feel almost magical. Seeing their own face accurately preserved while the background, lighting, or style changed instantly left many attendees amazed.
Common reactions we observed:
Excitement – People gathered around to see others’ results, creating a buzz at the booth.
Surprise – Many didn’t expect such high-quality, realistic outputs within seconds.
Amazement – The fact that this was happening live, at an event, without delays or glitches, made the experience memorable.
For many participants, this was their first hands-on interaction with Generative AI, not through slides or demos, but through something personal, their own photo. That personal connection made the technology feel real, approachable, and fun.
What the System Was (High-Level)
Behind the scenes, the system followed a clean, event-ready flow designed for speed, reliability, and scale:
Capture → Secure Upload → AI Edit → Store → Display
At the booth, the interaction felt instant and effortless. A photo was captured and immediately handed off to the cloud, where all the heavy lifting happened invisibly. This deliberate separation ensured that the local device stayed lightweight, responsive, and resilient even during peak crowd moments.
Capture
High-resolution images were captured locally with minimal on-device processing
The booth hardware focused only on speed and reliability, not compute
This avoided overheating, crashes, or performance drops during continuous usage
Secure Upload
Images were uploaded instantly over a secure channel
Authentication and access control were enforced at the cloud boundary
No sensitive data was stored permanently on the booth device
AI Edit
Once in the cloud, AI-based image enhancement and face-preserving edits were triggered automatically
GPU-intensive workloads ran entirely on scalable infrastructure
Processing could scale up or down based on real-time demand without affecting booth performance
Store
Edited images were stored in durable cloud storage
Each image was uniquely tagged for easy retrieval and tracking
This made it simple to reuse images later for emails, galleries, or analytics
Display
The final image was streamed back to the booth display within seconds
Participants saw their enhanced photo almost instantly, maintaining excitement and engagement
The fast feedback loop made the experience feel “magical” despite the complex backend
The key design principle was intentional separation of concerns.
The booth handled interaction and capture
The cloud handled compute, security, and reliability
By pushing complexity into AWS and keeping the edge device simple, the system remained stable under load, easy to operate at a live event, and ready to scale without redesign. This is what allowed a smooth, high-energy participant experience while running a production-grade AI system behind the scenes.
How it was implemented
Model Choice & Quantization Rationale
For our image editing and face-retention pipeline, we selected Qwen Image Edit as the primary model, paired with 4-bit bitsandbytes (bnb) quantization. This combination offered the best balance between identity preservation, edit precision, and deployment efficiency, also while being all open-source.
Why Qwen Image Edit?

Qwen Image Edit is optimized for instruction-based image editing, making it particularly strong at localized changes while preserving facial identity. Unlike generic text-to-image models, it understands what to change and what to keep, which is critical for tasks like background replacement, lighting correction, or style adjustments without altering facial features.
Its architecture demonstrates:
Strong face consistency across edits
Better semantic alignment between prompt and output
Minimal identity drift compared to diffusion-only approaches
This makes it well-suited for real-world photo booths and personalized image workflows.
Why 4-bit bnb Quantization?
To enable efficient inference on limited GPU resources, we applied 4-bit bnb quantization. This significantly reduces VRAM usage while maintaining perceptual quality, especially important for face-centric edits.
Key benefits:
~70–75% reduction in memory footprint
Faster inference with negligible loss in facial detail
Practical for on-device or edge deployments
In our testing, face structure and expressions remained intact even under aggressive quantization.
Models Explored (and Why They Fell Short)
We evaluated several popular alternatives:
Fooocus – Excellent for aesthetics and prompt adherence, but less reliable for strict face retention across edits.
InstantID – Strong identity control, but more rigid and less flexible for general image editing workflows.
Stable Diffusion – Highly versatile, but requires heavy tuning and auxiliary models to achieve consistent face preservation.
Flux – Promising generation quality, but not yet as stable for instruction-guided editing use cases.
Final Takeaway
By combining Qwen Image Edit with 4-bit bnb quantization, we achieved a system that is:
Identity-safe
Edit-precise
Resource-efficient
This makes it a practical and scalable choice for production-grade face-preserving image editing using Generative AI.
Why bitsandbytes?
bitsandbytes is widely adopted because it makes large language models usable on limited hardware without heavy engineering effort.
The key reasons teams choose bitsandbytes are:
Massive VRAM savings
Loading models in 8-bit or 4-bit precision drastically reduces GPU memory usage, enabling billion-parameter models to run on consumer GPUs.Minimal accuracy loss
Instead of naïve quantization, bitsandbytes preserves higher precision for sensitive computations, maintaining strong model quality even at low bit-widths.Supports fine-tuning, not just inference
Unlike many inference-only formats, bitsandbytes allows parameter-efficient fine-tuning (such as LoRA-style adapters) on quantized models.Seamless ecosystem integration
It works directly with modern transformer pipelines, requiring minimal changes to existing workflows.Hardware-aware optimizations
bitsandbytes leverages optimized low-level kernels to efficiently run quantized matrix operations on supported accelerators.
In practice, bitsandbytes is often the default choice when you want fast experimentation, fine-tuning, or deployment of large models on constrained GPUs, without committing to a specialized inference-only format.
Why not the others?
While multiple quantization approaches exist, each comes with trade-offs that make them better suited for specific scenarios. bitsandbytes is often preferred when flexibility and ease of use matter most.
GGUF (Why not GGUF here?)
GGUF is optimized for inference-only workflows, especially on CPUs and edge devices.
However:
It is not designed for fine-tuning or training
Models are typically locked into specific runtimes
Less suitable for GPU-centric experimentation workflows
If your goal involves iterating, experimenting, or adapting models, GGUF can feel restrictive compared to bitsandbytes.
AWQ (Why not AWQ here?)
AWQ excels at high-quality 4-bit inference, but:
It is primarily inference-focused
Requires pre-quantized checkpoints
Offers less flexibility for experimentation or training
AWQ shines when deploying a model at scale, but it is less convenient during rapid prototyping or research workflows.
GPTQ and similar methods
Other advanced quantization techniques (such as GPTQ-style methods):
Often require custom calibration steps
Are more sensitive to hardware and model architecture
May introduce friction into standard workflows
They deliver excellent compression, but at the cost of complexity and rigidity.
Why bitsandbytes stands out
bitsandbytes sits in the middle ground:
More flexible than inference-only formats
Easier to integrate than specialized quantizers
Supports both inference and parameter-efficient fine-tuning
Ideal for development, experimentation, and small-scale deployment
This makes it a practical default choice when working with large models on limited hardware.
Architecture and Integration
The architecture was intentionally simple, both to explain to a broad audience and to ensure reliability during a live event.

Step by step flow
Attendee captures a photo at the booth, by scanning the ticket at the booth
Image is securely sent to an API layer
API validates and forwards the request
Secure networking keeps all traffic protected
GPU based EC2 instance runs AI inference
Final image is stored temporarily
Booth display retrieves and shows the result
This loose coupling between components ensured stability and smooth performance.
How It Was Hosted on AWS
AWS provided the backbone that made the live experience reliable and scalable.

AWS components and roles
GPU enabled EC2 instance(g6e.xlarge) for AI inference
S3 Object storage for temporary image storage
Secure networking for protected data flow
Cloud infrastructure that handled continuous usage
Benefits of using AWS:
Stable performance during a public event
Secure handling of user images
Easy path to scale for future events or multiple booths
Community Impact and Takeaways
The GenAI Photo Booth was more than an attraction. It acted as a learning tool for the community.
Impact on the community
Made Generative AI tangible and approachable
Helped attendees understand real world AI systems
Connected cloud infrastructure concepts with live applications
Inspired students and developers to experiment
By experiencing Generative AI firsthand, attendees could see how models, quantization, and cloud infrastructure come together in a working system. The booth demonstrated that with the right design and cloud support, advanced AI can be built and showcased even at community driven events.
In the end, the experience reinforced a simple idea. The best way to understand technology is often to interact with it directly.
The source can be found here on Github.
