Exp.1: Understanding Legalese with LLMs. Debrief

Exp.1: Understanding Legalese with LLMs.  Debrief
Can AI help us make sense of what we’re really agreeing to?
Executive Summary
In the first Mellonhead Labs experiment, participants explored how large language models (LLMs) handle complex, dense legal documents - specifically the Terms & Conditions from Ulta and Sephora. The aim was to discover what was similar, what was unique, and whether AI tools could help make these documents easier to digest for non-legal readers.
Participants used a range of models - ChatGPT, NotebookLM, and Microsoft Copilot - and tested prompt techniques including chaining, context-setting, role assignment, and comparative formatting. Results varied depending on the model and how prompts were constructed. NotebookLM stood out for its citation fidelity and structure; ChatGPT was conversational but skewed toward user assumptions; and Copilot surprised with emoji-enhanced, UI-focused output.
The group worked collaboratively - some preparing detailed flows, others experimenting live - and contributed shared learnings around prompt design, model behavior, and user bias. This community-led, real-time testing delivered on Mellonhead's mission to provide practical, accessible, people-focused AI education without the hype.
The Experiment
Compare the Terms & Conditions from Ulta and Sephora using AI.
Test different prompting techniques and tools (ChatGPT, NotebookLM, Copilot).
Surface what’s similar, what’s unique—and what matters to real people.
“The goal wasn’t to get it perfect—it was to try, learn, and reflect together.”
Consumer rights and data usage:  What we found 
Our experiment highlighted several important data related themes present in these documents that take liberty with what consumers voluntarily provide while using these brands' services. 
Sephora’s terms include rights to repurpose user photos in social media.
Both brands require opt-ins that may waive legal rights.
Data retention periods and control options vary between Ulta and Sephora.
Consumers were surprised by the lack of transparency and default brand protection language.
Prompting AI to look through these terms uncovered fine print not usually caught.
"Loyalty cards... you're getting coupons and in exchange your data is just going to everybody."
"Ulta, for instance, I think didn't hold on to the data that long. It was pretty restricted who could have it."
"Sephora can use your photos on their socials... I don't think anybody submitting a review would reasonably expect that."
“After reading the Ulta policy, I was actually more comfortable. They didn’t hold onto the data that long. It was pretty restricted who could have it.”
AI Model Comparison — Based on Participant Feedback
NotebookLM
✅ Most accurate and structured
✅ Cites sources directly
✅ Summarizes without needing a prompt
“NotebookLM was very different—very clinical and just the answers and citing its sources.”
“It summarized the docs without asking, which helped clarify how I needed to prompt.”
“In it's output, it cited the location in the source document . So I was confident in that.”
“It brought back too much unless I gave it categories—but still more trustworthy.”
💬 Summary:
Best for legal document accuracy. Most structured. High citation fidelity. Requires some prompt refinement to avoid info overload.
ChatGPT (GPT-4)
✅ Easy to use and conversational
⚠️ Can skew results based on user phrasing
⚠️ Does not cite sources unless explicitly asked
⚠️ May carry over prior context
“ChatGPT became more casual and started to skew answers based on my prompts.”
“It was giving me what it thought I wanted to hear.”
“ChatGPT doesn’t check its work unless you tell it.”
“In a new chat, it found an error it missed before.”
“Without categories, NotebookLM returned everything; ChatGPT reworded it with less control.”
💬 Summary:
Useful for summarization, but less reliable for legal nuance. Requires precise prompts and separate sessions for accuracy.
Copilot
✅ Simple summaries with visual formatting (emojis, icons)
⚠️ No mention of citations or document sourcing
⚠️ Light on legal nuance, more UX-focused
“Copilot... especially loves putting in little emojis, check marks.”
“It was trying to make this visually stimulating for me.”
💬 Summary:
Good for readability and visual learners. Not built for legal precision. Best used for casual summaries or UX-first experiences.
Overall Prompting Tactics for Legal Document Reviews
Clarity, context, and chaining made all the difference.
Add context (your use case or behaviors)
Prompt chaining: useful for working with long complex dense document
Define categories clearly for the AI to focus it's answers.  "When I didn't provide categories, the output felt like a rewording of the entire T&C."
Use role-based prompts with precision as without structure and categories this doesn't provide impactful results
Providing examples:  this assisted with clear understandable output and reduced a simple regurgitation of another long dense document to read
Ask the model to re-read; this boosted comprehension and reduced hallucination
Try 'extract' vs. 'summarize'  "'Extract' is more faithful to the original document. 'Summarize' paraphrases and simplifies."
Validate output in a new chat or with a second model: this reduced incidence of halluciations.  GPT will not check it's work unless prompted to. NotebookLM cited it's sources. 
“Less conversation and more direct instructions gave clearer answers.”
“I told it to reread the doc five times. That changed the output.”
Overall Ranking (Based on This Experiment)
✅ Sephora’s policy is less risky to your personally identifiable data than Ulta’s.
🔒 Why Sephora is the safer option:
More restrictive age gate
Sephora requires users to be at least 16 or 18 depending on their state and must have legal capacity to consent.
Ulta allows users as young as 13 and provides limited verification or clarity on guardian consent.
No explicit biometric data collection
Sephora references biometric data in passing (e.g. facial scans in try-on tools) but does not state that it stores or processes this data.
Ulta explicitly collects and stores facial and hand geometry through its GLAMlab and virtual try-on tools, with retention up to 1 year.
Stronger opt-out mechanisms and revocation rights
Sephora clearly outlines how to cancel memberships, remove loyalty links, and opt out of communications.
Ulta mentions some opt-out (like for SMS), but does not clarify deletion of user content, revocation of rights, or full data removal.
Tighter third-party sharing rules
Sephora’s partners (like Kohl’s and Hearst) are named, with strict limits on data usage.
Ulta shares data with multiple third-party biometric firms, and no opt-out is clearly documented.
⚠️ Ulta poses greater risk due to:
Broad biometric data collection and sharing
Lenient minor access policies
Granting itself perpetual global rights to use user-uploaded content
Vague opt-out and deletion policies
🧭 Conclusion
If you're concerned about your face, voice, images, or identity being stored, reused, or monetized, Sephora offers a slightly stronger legal posture for user protection.
Ulta's use of biometric tools and community content grants them more control and introduces higher risk, especially for underage users or those unaware of how their likeness can be used.