Evaluator Guide

This guide provides comprehensive instructions for evaluators using the Lakra system to review annotations and assess machine translation quality.

Overview

As an evaluator, your role is to:

Review annotations created by annotators
Assess machine translation quality with AI assistance
Provide constructive feedback on annotation quality
Validate AI-generated quality assessments
Track evaluation metrics and maintain quality standards

Getting Started

Understanding Your Dashboard

Your evaluator dashboard shows:

Pending Evaluations: Annotations awaiting your review
Pending Quality Assessments: Translations needing AI-assisted review
Completed Reviews: Your evaluation history
Performance Metrics: Agreement rates, completion statistics
Queue Overview: Workload distribution

Evaluation Types

Lakra supports two main evaluation workflows:

1. Annotation Evaluation

Review and score annotations created by other annotators

2. Quality Assessment

Evaluate machine translation quality with AI assistance

Annotation Evaluation Workflow

Step 1: Select an Annotation to Review

Navigate to “Pending Evaluations”
Select an annotation from the list
You’ll see:
- Source text
- Machine translation
- Annotator’s highlights and error markings
- Annotator’s quality ratings
- Annotator’s comments
- Voice recording (if provided)

Step 2: Review the Annotation

Carefully examine:

Error Highlights

Accuracy: Are highlighted errors actually errors?
Completeness: Are all errors caught?
Classification: Are error types correct (MI_ST, MI_SE, MA_ST, MA_SE)?
Descriptions: Are error explanations clear and accurate?

Quality Ratings

Fluency: Does the rating match the translation’s naturalness?
Adequacy: Does it correctly reflect meaning preservation?
Overall Quality: Is the holistic assessment appropriate?

Comments and Suggestions

Clarity: Are comments understandable?
Usefulness: Do suggestions actually improve the translation?
Correctness: Are proposed corrections accurate?

Voice Recording (if present)

Listen to the entire recording
Verify explanations match written annotations
Assess clarity and correctness

Step 3: Provide Evaluation Scores

Rate the annotation on multiple dimensions (1-5 scale):

Accuracy Score

How correct is the annotation?

5 - Excellent: All errors correctly identified and classified
4 - Good: Minor issues with identification or classification
3 - Fair: Several errors in accuracy or completeness
2 - Poor: Significant inaccuracies or omissions
1 - Very Poor: Mostly incorrect or incomplete

Tip

Compare the annotation against your own expert judgment of the translation.

Completeness Score

How thorough is the annotation?

5 - Excellent: All significant errors caught and documented
4 - Good: Only minor errors missed
3 - Fair: Some important errors missed
2 - Poor: Many errors overlooked
1 - Very Poor: Most errors not identified

Overall Annotation Quality

Your holistic assessment

5 - Excellent: Publication-ready annotation
4 - Good: Usable with minimal improvements
3 - Fair: Acceptable but needs refinement
2 - Poor: Significant improvements needed
1 - Very Poor: Does not meet quality standards

Step 4: Provide Detailed Feedback

Write constructive feedback:

Strengths: What the annotator did well
Areas for Improvement: Specific issues to address
Missed Errors: Errors the annotator didn’t catch
Classification Issues: Incorrect error categorizations
Suggestions: How to improve future annotations

Best Practices for Feedback:

✅ Do:

Be specific with examples
Explain your reasoning
Offer constructive suggestions
Acknowledge good work
Maintain professional tone

❌ Don’t:

Use harsh or personal language
Provide vague criticism
Focus only on negatives
Assume bad faith
Be inconsistent in standards

Step 5: Submit Evaluation

Before submitting:

Review all scores
Verify feedback is constructive and clear
Double-check for any missed points
Click “Submit Evaluation”

Quality Assessment Workflow

Quality Assessment uses AI to help you evaluate machine translation quality efficiently.

Step 1: Access Quality Assessment

Navigate to “Quality Assessment” section
Select a translation to assess
You’ll see:
- Source text
- Machine translation
- AI-generated quality scores (if available)
- AI-detected errors
- AI explanation of issues

Step 2: Review AI Assessment

The AI provides:

AI Quality Scores

Fluency Score: AI assessment of naturalness
Adequacy Score: AI assessment of meaning preservation
Overall Quality Score: AI holistic rating
Confidence Levels: How confident the AI is in its scores

AI Error Detection

Syntax Errors: Grammatical issues detected
Semantic Errors: Meaning-related problems
Error Locations: Highlighted problematic spans
Severity Levels: Minor vs. major issues

AI Explanations

Reasoning: Why the AI gave these scores
Specific Issues: Detailed problem descriptions
Improvement Suggestions: AI-generated recommendations

Step 3: Validate AI Assessment

Your job is to:

Review AI Findings: Are they accurate?
Confirm or Reject: Accept or modify AI suggestions
Add Human Insight: Provide expert judgment
Identify Missed Issues: Find what AI didn’t catch

Feedback Options

✓ Confirm: AI assessment is correct
✗ Reject: AI assessment is wrong (provide reason)
⚠ Modify: Partial agreement (adjust scores/explanations)

Step 4: Provide Your Assessment

Based on AI suggestions and your expertise:

Set Final Scores: Adjust AI scores if needed
Add Comments: Explain your decisions
Highlight Additional Issues: Mark missed errors
Provide Recommendations: Suggest improvements

Step 5: Submit Quality Assessment

Review and submit your validated assessment.

Evaluation Best Practices

Maintaining Consistency

Use Guidelines: Follow established criteria strictly
Calibrate Regularly: Review example evaluations
Document Standards: Keep notes on edge cases
Discuss with Peers: Align understanding with other evaluators

Being Fair and Objective

Focus on Work, Not Person: Evaluate the annotation, not the annotator
Apply Standards Equally: Same criteria for all annotations
Consider Context: Sentence difficulty, language pair complexity
Avoid Bias: Don’t let previous annotations influence current review

Providing Valuable Feedback

Effective Feedback Structure:

✓ STRENGTHS:
- Correctly identified all major semantic errors
- Clear and detailed error descriptions
- Appropriate quality ratings for fluency

⚠ AREAS FOR IMPROVEMENT:
- Missed minor punctuation issues (line 2, "example")
- Could be more specific in correction suggestions
- Consider the domain context for terminology choices

💡 SUGGESTIONS:
- Review guidelines on minor syntax classification
- Try providing alternative translations in comments
- Great work overall - keep up the attention to detail!

Working with AI Assistance

Understanding AI Capabilities

AI is good at:

Pattern recognition for common errors
Objective grammar checking
Consistency in scoring
Processing large volumes

AI has limitations with:

Cultural context and nuance
Domain-specific terminology
Creative or figurative language
Ambiguous cases requiring human judgment

When to Override AI

Trust your expertise and override AI when:

AI misses cultural nuances
Domain knowledge is needed
Context is misunderstood
Scores seem inconsistent with actual quality
Explanations are wrong or misleading

Improving AI Performance

Your feedback helps train the system:

Be specific about why you disagree
Provide examples of correct assessments
Note patterns in AI mistakes
Report systematic issues to administrators

Interface Guide

Annotation Review Interface

Main Panel:

Split view: source and machine translation
Highlight overlay showing annotator’s marks
AI suggestions panel (if enabled)

Sidebar:

Annotation details
Annotator information
Quality ratings
Comments and voice recordings

Evaluation Form:

Score inputs
Feedback text area
Submit/Cancel buttons

Quality Assessment Interface

Main Panel:

Source and translation display
AI quality scores
AI-detected errors
Confidence indicators

Sidebar:

Validation controls (Confirm/Reject/Modify)
Your assessment form
Additional notes

Control Panel:

Navigation between assessments
Filter and search options
Export functionality

Keyboard Shortcuts

Shortcut	Action
`→`	Next evaluation
`←`	Previous evaluation
`Ctrl/Cmd + Enter`	Submit evaluation
`Tab`	Move to next field
`1-5`	Quick rating (when focused on score)
`A`	Confirm AI suggestion
`R`	Reject AI suggestion

Metrics and Performance

Your Performance Indicators

Track your evaluation quality:

Agreement Rates

Inter-evaluator Agreement: Match with other evaluators
AI Agreement Rate: How often you confirm AI assessments
Consistency Score: Variation in your own judgments

Productivity Metrics

Evaluations Completed: Total count
Average Time per Evaluation: Efficiency indicator
Queue Processing Rate: Workflow velocity

Quality Indicators

Feedback Quality: How helpful your comments are
Accuracy Recognition: When your assessments match expert consensus
Calibration Score: Alignment with standards

Improving Your Performance

Review Feedback: Learn from quality assurance reviews
Study Disagreements: Understand why others rated differently
Attend Calibration Sessions: Align with team standards
Request Difficult Cases: Challenge yourself to improve
Track Trends: Monitor your metrics over time

Handling Special Cases

Difficult Evaluations

Borderline Quality:

Use half-points or detailed explanations
Document decision rationale
Consider requesting second opinion

Ambiguous Errors:

Note the ambiguity in feedback
Explain multiple valid interpretations
Defer to guidelines when possible

Incomplete Annotations:

Note what’s missing
Provide guidance on completeness
Score based on what’s present

Disagreeing with Annotators

When you significantly disagree:

Double-check: Ensure you’re correct
Explain Clearly: Detailed reasoning in feedback
Be Respectful: Professional, constructive tone
Provide Examples: Show correct approach
Flag for Review: Escalate if needed

AI Errors or Bugs

If you encounter system issues:

Document: Screenshot and describe the problem
Report: Use the feedback/bug report feature
Work Around: Complete evaluation manually if possible
Note in Comments: Mention technical issue encountered

FAQ for Evaluators

Q: How long should each evaluation take? A: Typically 10-15 minutes, but varies with complexity. Quality over speed.

Q: What if I’m unsure about a rating? A: Use the middle range (3) and explain your uncertainty in comments. Flag for a second review if needed.

Q: Should I always trust the AI assessment? A: No. Use AI as a helpful tool, but apply your expert judgment. Override when necessary.

Q: How harsh should I be in evaluations? A: Be fair and accurate. Focus on helping annotators improve, not being punitive.

Q: Can annotators see who evaluated their work? A: This depends on system configuration. Ask your administrator.

Q: What if I find a major error in many annotations? A: Report the pattern to administrators - may indicate need for additional annotator training.

Tips for Success

Effective Evaluation Strategies

First Pass: Quick overview to understand the annotation
Detailed Review: Careful examination of each highlighted error
Comparison: Check against your own expert assessment
Feedback Composition: Write clear, helpful comments
Final Review: Double-check scores and feedback before submitting

Managing Workload

Prioritize: Start with oldest or most critical evaluations
Batch Similar: Group evaluations by language pair or annotator
Take Breaks: Prevent fatigue affecting judgment
Set Daily Goals: Maintain steady progress
Balance: Mix easy and difficult evaluations

Continuous Improvement

Learn from Feedback: Apply quality assurance input
Stay Current: Review guideline updates
Participate in Calibration: Regular alignment sessions
Share Knowledge: Discuss challenging cases with peers
Mentor Annotators: Help improve overall quality

Next Steps

Practice with sample evaluations
Review the Features documentation
Check the FAQ for additional questions
Consult the Technical Manual for system details