For months, I struggled with a specific problem: I had excellent reference videos showing exactly the kind of cinematography or movement I wanted, but I couldn’t quite explain to Seedance 2.0 how I wanted to apply that reference to my specific project.
I’d upload a reference video and write a prompt, and the results would be hit-or-miss. Sometimes the model would nail it, understanding exactly what I meant and applying the reference perfectly. Other times, it would interpret the reference differently than I intended, or use it in ways that didn’t match my vision.
I realized the problem wasn’t the tool—it was my approach. I was treating the combination of reference videos and text prompts as two separate things. Once I started thinking about them as an integrated system where the reference and the prompt work together, my results improved dramatically.
Why Both Reference Videos and Text Prompts Matter
Let me be clear about what each element contributes:
Reference videos show the model what aesthetic or movement pattern you want. They provide visual examples—a specific camera movement, a particular editing style, a type of choreography, a lighting approach. The reference is concrete and visual.
Text prompts explain how you want to apply or adapt that reference to your specific context. They provide context, creative direction, and specificity that the reference alone might not communicate.
Together, they’re more powerful than either alone. A reference video without context is just an example. A text prompt without reference is just a description. But a reference video plus a well-written prompt is a complete creative instruction.
Early Mistakes I Made
Before I refined this approach, I made several mistakes:
Mistake One: Over-Relying on the Reference
I’d upload a reference video and write vague prompts like “use this style” or “make it like this reference video.” I assumed the model would understand the context automatically. It didn’t. The model would recreate elements of the reference literally rather than adapting them thoughtfully to my project.
Mistake Two: Ignoring What Made the Reference Good
I’d use a reference video because I liked it overall, but I wouldn’t specify what aspects I actually wanted replicated. Did I want the camera movement? The color grading? The pacing? The model had to guess, and it often guessed wrong.
Mistake Three: Conflicting Instructions
Sometimes I’d upload a reference showing one style, then write a prompt asking for something completely different. The model would try to honor both, resulting in a confused output that satisfied neither.
Mistake Four: Vague Application Instructions
I’d reference professional cinematography but forget to explain that I wanted it applied to a small product in a home setting, not a Hollywood film set. The reference and the prompt didn’t align on context.
The Framework I Developed
After recognizing these mistakes, I developed a structured approach to combining references and prompts. It has three key components:
Step One: Specify What You’re Referencing
Don’t just upload a reference and assume the model knows what to pay attention to. In your prompt, explicitly state what aspects of the reference you want replicated.
For example, instead of: “Use this reference video”
Write: “Reference the camera movement from @video1—specifically the smooth lateral tracking shot that pulls back to reveal the full scene.”
This clarity matters enormously. It tells the model “pay attention to the motion, not the lighting or the actors or the setting.”
Step Two: Provide Contextual Constraints
Explain how this reference should be adapted to your specific situation. What’s different between the reference and your project?
For instance: “The reference shows a professional studio setting, but I need the same camera movement applied to a living room with natural window lighting. Adjust for the available light sources.”
This bridges the gap between the reference (which might be in a very different context) and your actual needs.
Step Three: Be Explicit About Details
Rather than assuming the model understands your vision, spell out the specifics:
- What should the character or subject look like? (Reference an image or describe)
- Where should the action take place? (Be specific about the setting)
- What’s the emotional tone or energy? (Calm, energetic, dramatic, humorous?)
- How should the reference be adapted stylistically? (Same movement, but more intimate; same camera work, but with a different mood?)
A Real Project: Commercial Video With Complex References
I was working on a commercial video for a luxury watch brand. The client showed me three reference videos:
- A high-end jewelry commercial with beautiful product lighting and rotation shots
- A lifestyle video showing someone in an elegant setting
- A cinematic trailer with dramatic camera movements and color grading
The client wanted me to combine elements from all three into a 15-second product commercial. This was complex—three different references with different aesthetics and purposes.
Here’s how I structured my approach:
My Prompt Structure:
“Create a 15-second luxury watch commercial. Use these references:
From @video1 (jewelry commercial): Replicate the rotating product shot with precise lighting that highlights the watch face and metalwork. The product should dominate the frame, with careful attention to reflection and shine.
From @video2 (lifestyle video): Incorporate the aesthetic of elegance and refinement—the setting should feel upscale and sophisticated, but the watch remains the focal point, not the environment.
From @video3 (cinematic trailer): Apply the camera movement style—smooth, intentional, with subtle dramatic pauses—but tone down the intensity to suit a product video rather than an action trailer. The pacing should be deliberate.
Combine these elements: Start with the lifestyle setting from @video2, introduce the watch with the product lighting from @video1, and use the camera movements from @video3. The final result should feel like a cohesive commercial that borrows aesthetic elements from all three references without looking disjointed.”
Why This Works:
By being specific about what I wanted from each reference and how I wanted to integrate them, I gave the model clear direction. It wasn’t guessing about my intent. It understood exactly which aspects of each reference to prioritize and how to blend them.
Practical Techniques I’ve Learned
Technique One: “Borrow, Don’t Copy”
I explicitly tell the model to borrow aesthetic elements rather than recreate the reference. This phrasing helps the model understand that I want inspiration, not duplication.
Example: “Borrow the warm color grading from @image1, but apply it to the cooler environment from @image2.”
Technique Two: Reference Comparison
Sometimes I reference two different videos and ask the model to blend them or choose between them:
“The motion style from @video1 (fast and dynamic) contrasted with the pacing from @video2 (slow and contemplative)—find a middle ground that combines energy with moments of stillness.”
Technique Three: Explicit Constraints
I set boundaries to prevent the model from misinterpreting the reference:
“Use the camera angle from @video1, but keep the character’s appearance consistent with @image2. Do not adopt the dramatic color grading from @video1; instead, use natural lighting.”
Technique Four: Reference Sequencing
For longer videos, I reference different elements for different segments:
“In the first 5 seconds, use the camera movement style from @video1. In seconds 5-10, transition to the pacing and movement quality from @video2. In the final 5 seconds, return to @video1 style.”
What Doesn’t Work Well
I’ve also learned what doesn’t work:
Too Many References
Uploading 9 different reference videos and asking the model to incorporate all of them creates confusion. The model doesn’t know which reference is most important. Stick to 2-4 key references.
Conflicting Aesthetics
If references have fundamentally different aesthetics—one high-fashion, one gritty realism—asking the model to blend them doesn’t work. Either choose one as primary and borrow elements from others, or acknowledge that they’re incompatible.
Expecting Literal Translation
Sometimes I want a reference’s movement adapted to different content. But if the reference shows a person dancing and I want the movement applied to a robot, the translation doesn’t always work intuitively. The model struggles with that leap.
Vague Prompts With Multiple References
The more references you provide, the more specific your prompt needs to be. Vague prompts with multiple references produce confused results.
How This Improves Results
Using this integrated approach has noticeably improved my output quality:
Higher Consistency
Because I’m explicit about what I want from each reference, the model produces more consistent results across multiple generations. I regenerate less often.
Better Adaptation
The model understands how to adapt a reference to my specific context, rather than trying to replicate it literally. This means the reference inspires my video rather than constraining it.
More Creative Control
By specifying exactly how I want references applied, I’m actually more in control of the creative direction, not less. The reference becomes a tool I’m using deliberately, not a constraint.
Fewer Surprises
Vague approaches lead to unpredictable results. Specific approaches, where I explain exactly what I want, lead to predictable results. This means less iteration and faster production.
Current Workflow
When I have reference videos for a project, I now:
- Identify Key References – Which videos show which aspects of my vision?
- Label Them Mentally – What does each reference contribute? (Movement, aesthetic, mood, pacing, etc.)
- Write Structured Prompts – For each reference, explicitly state what I’m borrowing and how
- Provide Context – Explain how the reference should be adapted to my specific project
- Test and Iterate – If the first generation doesn’t nail it, adjust the prompt for clarity, not just the reference
This structured approach takes slightly longer than casually uploading a reference and writing a vague prompt. But it produces dramatically better results, which actually saves time overall.
Why This Matters
The combination of reference videos and text prompts is one of Seedance 2.0’s most powerful capabilities. But it’s also one that requires some intentionality to use well.
Too many creators treat the reference as the whole instruction and the text as secondary. Or vice versa—they treat the text as the instruction and the reference as decoration. In reality, they work together.
Understanding how to integrate them—how to use the reference to show visual examples while using the text to explain intent—unlocks the full creative potential of the tool.
For video creators, filmmakers, or anyone generating content that requires specific aesthetic choices or movement patterns, mastering this integration is genuinely valuable. It’s the difference between getting lucky occasionally and producing consistently excellent results.


