Evaluator Guide
This guide provides comprehensive instructions for evaluators using the Lakra system to review annotations and assess machine translation quality.
Overview
As an evaluator, your role is to:
Review annotations created by annotators
Assess machine translation quality with AI assistance
Provide constructive feedback on annotation quality
Validate AI-generated quality assessments
Track evaluation metrics and maintain quality standards
Getting Started
First Login
Access Lakra: Navigate to your Lakra instance URL
Sign In: Use your email or username and password
Onboarding: Complete the evaluator onboarding test if required
Dashboard Access: You’ll see the evaluator dashboard
Understanding Your Dashboard
Your evaluator dashboard shows:
Pending Evaluations: Annotations awaiting your review
Pending Quality Assessments: Translations needing AI-assisted review
Completed Reviews: Your evaluation history
Performance Metrics: Agreement rates, completion statistics
Queue Overview: Workload distribution
Evaluation Types
Lakra supports two main evaluation workflows:
1. Annotation Evaluation
Review and score annotations created by other annotators
2. Quality Assessment
Evaluate machine translation quality with AI assistance
Annotation Evaluation Workflow
Step 1: Select an Annotation to Review
Navigate to “Pending Evaluations”
Select an annotation from the list
You’ll see:
Source text
Machine translation
Annotator’s highlights and error markings
Annotator’s quality ratings
Annotator’s comments
Voice recording (if provided)
Step 2: Review the Annotation
Carefully examine:
Error Highlights
Accuracy: Are highlighted errors actually errors?
Completeness: Are all errors caught?
Classification: Are error types correct (MI_ST, MI_SE, MA_ST, MA_SE)?
Descriptions: Are error explanations clear and accurate?
Quality Ratings
Fluency: Does the rating match the translation’s naturalness?
Adequacy: Does it correctly reflect meaning preservation?
Overall Quality: Is the holistic assessment appropriate?
Voice Recording (if present)
Listen to the entire recording
Verify explanations match written annotations
Assess clarity and correctness
Step 3: Provide Evaluation Scores
Rate the annotation on multiple dimensions (1-5 scale):
Accuracy Score
How correct is the annotation?
5 - Excellent: All errors correctly identified and classified
4 - Good: Minor issues with identification or classification
3 - Fair: Several errors in accuracy or completeness
2 - Poor: Significant inaccuracies or omissions
1 - Very Poor: Mostly incorrect or incomplete
Tip
Compare the annotation against your own expert judgment of the translation.
Completeness Score
How thorough is the annotation?
5 - Excellent: All significant errors caught and documented
4 - Good: Only minor errors missed
3 - Fair: Some important errors missed
2 - Poor: Many errors overlooked
1 - Very Poor: Most errors not identified
Overall Annotation Quality
Your holistic assessment
5 - Excellent: Publication-ready annotation
4 - Good: Usable with minimal improvements
3 - Fair: Acceptable but needs refinement
2 - Poor: Significant improvements needed
1 - Very Poor: Does not meet quality standards
Step 4: Provide Detailed Feedback
Write constructive feedback:
Strengths: What the annotator did well
Areas for Improvement: Specific issues to address
Missed Errors: Errors the annotator didn’t catch
Classification Issues: Incorrect error categorizations
Suggestions: How to improve future annotations
Best Practices for Feedback:
✅ Do:
Be specific with examples
Explain your reasoning
Offer constructive suggestions
Acknowledge good work
Maintain professional tone
❌ Don’t:
Use harsh or personal language
Provide vague criticism
Focus only on negatives
Assume bad faith
Be inconsistent in standards
Step 5: Submit Evaluation
Before submitting:
Review all scores
Verify feedback is constructive and clear
Double-check for any missed points
Click “Submit Evaluation”
Quality Assessment Workflow
Quality Assessment uses AI to help you evaluate machine translation quality efficiently.
Step 1: Access Quality Assessment
Navigate to “Quality Assessment” section
Select a translation to assess
You’ll see:
Source text
Machine translation
AI-generated quality scores (if available)
AI-detected errors
AI explanation of issues
Step 2: Review AI Assessment
The AI provides:
AI Quality Scores
Fluency Score: AI assessment of naturalness
Adequacy Score: AI assessment of meaning preservation
Overall Quality Score: AI holistic rating
Confidence Levels: How confident the AI is in its scores
AI Error Detection
Syntax Errors: Grammatical issues detected
Semantic Errors: Meaning-related problems
Error Locations: Highlighted problematic spans
Severity Levels: Minor vs. major issues
AI Explanations
Reasoning: Why the AI gave these scores
Specific Issues: Detailed problem descriptions
Improvement Suggestions: AI-generated recommendations
Step 3: Validate AI Assessment
Your job is to:
Review AI Findings: Are they accurate?
Confirm or Reject: Accept or modify AI suggestions
Add Human Insight: Provide expert judgment
Identify Missed Issues: Find what AI didn’t catch
Feedback Options
✓ Confirm: AI assessment is correct
✗ Reject: AI assessment is wrong (provide reason)
⚠ Modify: Partial agreement (adjust scores/explanations)
Step 4: Provide Your Assessment
Based on AI suggestions and your expertise:
Set Final Scores: Adjust AI scores if needed
Add Comments: Explain your decisions
Highlight Additional Issues: Mark missed errors
Provide Recommendations: Suggest improvements
Step 5: Submit Quality Assessment
Review and submit your validated assessment.
Evaluation Best Practices
Maintaining Consistency
Use Guidelines: Follow established criteria strictly
Calibrate Regularly: Review example evaluations
Document Standards: Keep notes on edge cases
Discuss with Peers: Align understanding with other evaluators
Being Fair and Objective
Focus on Work, Not Person: Evaluate the annotation, not the annotator
Apply Standards Equally: Same criteria for all annotations
Consider Context: Sentence difficulty, language pair complexity
Avoid Bias: Don’t let previous annotations influence current review
Providing Valuable Feedback
Effective Feedback Structure:
✓ STRENGTHS:
- Correctly identified all major semantic errors
- Clear and detailed error descriptions
- Appropriate quality ratings for fluency
⚠ AREAS FOR IMPROVEMENT:
- Missed minor punctuation issues (line 2, "example")
- Could be more specific in correction suggestions
- Consider the domain context for terminology choices
💡 SUGGESTIONS:
- Review guidelines on minor syntax classification
- Try providing alternative translations in comments
- Great work overall - keep up the attention to detail!
Working with AI Assistance
Understanding AI Capabilities
AI is good at:
Pattern recognition for common errors
Objective grammar checking
Consistency in scoring
Processing large volumes
AI has limitations with:
Cultural context and nuance
Domain-specific terminology
Creative or figurative language
Ambiguous cases requiring human judgment
When to Override AI
Trust your expertise and override AI when:
AI misses cultural nuances
Domain knowledge is needed
Context is misunderstood
Scores seem inconsistent with actual quality
Explanations are wrong or misleading
Improving AI Performance
Your feedback helps train the system:
Be specific about why you disagree
Provide examples of correct assessments
Note patterns in AI mistakes
Report systematic issues to administrators
Interface Guide
Annotation Review Interface
Main Panel:
Split view: source and machine translation
Highlight overlay showing annotator’s marks
AI suggestions panel (if enabled)
Sidebar:
Annotation details
Annotator information
Quality ratings
Comments and voice recordings
Evaluation Form:
Score inputs
Feedback text area
Submit/Cancel buttons
Quality Assessment Interface
Main Panel:
Source and translation display
AI quality scores
AI-detected errors
Confidence indicators
Sidebar:
Validation controls (Confirm/Reject/Modify)
Your assessment form
Additional notes
Control Panel:
Navigation between assessments
Filter and search options
Export functionality
Keyboard Shortcuts
Shortcut |
Action |
|---|---|
|
Next evaluation |
|
Previous evaluation |
|
Submit evaluation |
|
Move to next field |
|
Quick rating (when focused on score) |
|
Confirm AI suggestion |
|
Reject AI suggestion |
Metrics and Performance
Your Performance Indicators
Track your evaluation quality:
Agreement Rates
Inter-evaluator Agreement: Match with other evaluators
AI Agreement Rate: How often you confirm AI assessments
Consistency Score: Variation in your own judgments
Productivity Metrics
Evaluations Completed: Total count
Average Time per Evaluation: Efficiency indicator
Queue Processing Rate: Workflow velocity
Quality Indicators
Feedback Quality: How helpful your comments are
Accuracy Recognition: When your assessments match expert consensus
Calibration Score: Alignment with standards
Improving Your Performance
Review Feedback: Learn from quality assurance reviews
Study Disagreements: Understand why others rated differently
Attend Calibration Sessions: Align with team standards
Request Difficult Cases: Challenge yourself to improve
Track Trends: Monitor your metrics over time
Handling Special Cases
Difficult Evaluations
Borderline Quality:
Use half-points or detailed explanations
Document decision rationale
Consider requesting second opinion
Ambiguous Errors:
Note the ambiguity in feedback
Explain multiple valid interpretations
Defer to guidelines when possible
Incomplete Annotations:
Note what’s missing
Provide guidance on completeness
Score based on what’s present
Disagreeing with Annotators
When you significantly disagree:
Double-check: Ensure you’re correct
Explain Clearly: Detailed reasoning in feedback
Be Respectful: Professional, constructive tone
Provide Examples: Show correct approach
Flag for Review: Escalate if needed
AI Errors or Bugs
If you encounter system issues:
Document: Screenshot and describe the problem
Report: Use the feedback/bug report feature
Work Around: Complete evaluation manually if possible
Note in Comments: Mention technical issue encountered
FAQ for Evaluators
Q: How long should each evaluation take? A: Typically 10-15 minutes, but varies with complexity. Quality over speed.
Q: What if I’m unsure about a rating? A: Use the middle range (3) and explain your uncertainty in comments. Flag for a second review if needed.
Q: Should I always trust the AI assessment? A: No. Use AI as a helpful tool, but apply your expert judgment. Override when necessary.
Q: How harsh should I be in evaluations? A: Be fair and accurate. Focus on helping annotators improve, not being punitive.
Q: Can annotators see who evaluated their work? A: This depends on system configuration. Ask your administrator.
Q: What if I find a major error in many annotations? A: Report the pattern to administrators - may indicate need for additional annotator training.
Tips for Success
Effective Evaluation Strategies
First Pass: Quick overview to understand the annotation
Detailed Review: Careful examination of each highlighted error
Comparison: Check against your own expert assessment
Feedback Composition: Write clear, helpful comments
Final Review: Double-check scores and feedback before submitting
Managing Workload
Prioritize: Start with oldest or most critical evaluations
Batch Similar: Group evaluations by language pair or annotator
Take Breaks: Prevent fatigue affecting judgment
Set Daily Goals: Maintain steady progress
Balance: Mix easy and difficult evaluations
Continuous Improvement
Learn from Feedback: Apply quality assurance input
Stay Current: Review guideline updates
Participate in Calibration: Regular alignment sessions
Share Knowledge: Discuss challenging cases with peers
Mentor Annotators: Help improve overall quality
Next Steps
Practice with sample evaluations
Review the Features documentation
Check the FAQ for additional questions
Consult the Technical Manual for system details
See also
For information on annotation standards, see the Annotator Guide.
Comments and Suggestions
Clarity: Are comments understandable?
Usefulness: Do suggestions actually improve the translation?
Correctness: Are proposed corrections accurate?