# Evaluator Guide This guide provides comprehensive instructions for evaluators using the Lakra system to review annotations and assess machine translation quality. ## Overview As an evaluator, your role is to: - Review annotations created by annotators - Assess machine translation quality with AI assistance - Provide constructive feedback on annotation quality - Validate AI-generated quality assessments - Track evaluation metrics and maintain quality standards ## Getting Started ### First Login 1. **Access Lakra**: Navigate to your Lakra instance URL 2. **Sign In**: Use your email or username and password 3. **Onboarding**: Complete the evaluator onboarding test if required 4. **Dashboard Access**: You'll see the evaluator dashboard ### Understanding Your Dashboard Your evaluator dashboard shows: - **Pending Evaluations**: Annotations awaiting your review - **Pending Quality Assessments**: Translations needing AI-assisted review - **Completed Reviews**: Your evaluation history - **Performance Metrics**: Agreement rates, completion statistics - **Queue Overview**: Workload distribution ## Evaluation Types Lakra supports two main evaluation workflows: ### 1. Annotation Evaluation Review and score annotations created by other annotators ### 2. Quality Assessment Evaluate machine translation quality with AI assistance ## Annotation Evaluation Workflow ### Step 1: Select an Annotation to Review 1. Navigate to **"Pending Evaluations"** 2. Select an annotation from the list 3. You'll see: - Source text - Machine translation - Annotator's highlights and error markings - Annotator's quality ratings - Annotator's comments - Voice recording (if provided) ### Step 2: Review the Annotation Carefully examine: #### Error Highlights - **Accuracy**: Are highlighted errors actually errors? - **Completeness**: Are all errors caught? - **Classification**: Are error types correct (MI_ST, MI_SE, MA_ST, MA_SE)? - **Descriptions**: Are error explanations clear and accurate? #### Quality Ratings - **Fluency**: Does the rating match the translation's naturalness? - **Adequacy**: Does it correctly reflect meaning preservation? - **Overall Quality**: Is the holistic assessment appropriate? #### Comments and Suggestions - **Clarity**: Are comments understandable? - **Usefulness**: Do suggestions actually improve the translation? - **Correctness**: Are proposed corrections accurate? #### Voice Recording (if present) - Listen to the entire recording - Verify explanations match written annotations - Assess clarity and correctness ### Step 3: Provide Evaluation Scores Rate the annotation on multiple dimensions (1-5 scale): #### Accuracy Score **How correct is the annotation?** - **5 - Excellent**: All errors correctly identified and classified - **4 - Good**: Minor issues with identification or classification - **3 - Fair**: Several errors in accuracy or completeness - **2 - Poor**: Significant inaccuracies or omissions - **1 - Very Poor**: Mostly incorrect or incomplete ```{tip} Compare the annotation against your own expert judgment of the translation. ``` #### Completeness Score **How thorough is the annotation?** - **5 - Excellent**: All significant errors caught and documented - **4 - Good**: Only minor errors missed - **3 - Fair**: Some important errors missed - **2 - Poor**: Many errors overlooked - **1 - Very Poor**: Most errors not identified #### Overall Annotation Quality **Your holistic assessment** - **5 - Excellent**: Publication-ready annotation - **4 - Good**: Usable with minimal improvements - **3 - Fair**: Acceptable but needs refinement - **2 - Poor**: Significant improvements needed - **1 - Very Poor**: Does not meet quality standards ### Step 4: Provide Detailed Feedback Write constructive feedback: 1. **Strengths**: What the annotator did well 2. **Areas for Improvement**: Specific issues to address 3. **Missed Errors**: Errors the annotator didn't catch 4. **Classification Issues**: Incorrect error categorizations 5. **Suggestions**: How to improve future annotations **Best Practices for Feedback:** ✅ **Do**: - Be specific with examples - Explain your reasoning - Offer constructive suggestions - Acknowledge good work - Maintain professional tone ❌ **Don't**: - Use harsh or personal language - Provide vague criticism - Focus only on negatives - Assume bad faith - Be inconsistent in standards ### Step 5: Submit Evaluation Before submitting: 1. **Review** all scores 2. **Verify** feedback is constructive and clear 3. **Double-check** for any missed points 4. Click **"Submit Evaluation"** ## Quality Assessment Workflow Quality Assessment uses AI to help you evaluate machine translation quality efficiently. ### Step 1: Access Quality Assessment 1. Navigate to **"Quality Assessment"** section 2. Select a translation to assess 3. You'll see: - Source text - Machine translation - AI-generated quality scores (if available) - AI-detected errors - AI explanation of issues ### Step 2: Review AI Assessment The AI provides: #### AI Quality Scores - **Fluency Score**: AI assessment of naturalness - **Adequacy Score**: AI assessment of meaning preservation - **Overall Quality Score**: AI holistic rating - **Confidence Levels**: How confident the AI is in its scores #### AI Error Detection - **Syntax Errors**: Grammatical issues detected - **Semantic Errors**: Meaning-related problems - **Error Locations**: Highlighted problematic spans - **Severity Levels**: Minor vs. major issues #### AI Explanations - **Reasoning**: Why the AI gave these scores - **Specific Issues**: Detailed problem descriptions - **Improvement Suggestions**: AI-generated recommendations ### Step 3: Validate AI Assessment Your job is to: 1. **Review AI Findings**: Are they accurate? 2. **Confirm or Reject**: Accept or modify AI suggestions 3. **Add Human Insight**: Provide expert judgment 4. **Identify Missed Issues**: Find what AI didn't catch #### Feedback Options - **✓ Confirm**: AI assessment is correct - **✗ Reject**: AI assessment is wrong (provide reason) - **⚠ Modify**: Partial agreement (adjust scores/explanations) ### Step 4: Provide Your Assessment Based on AI suggestions and your expertise: 1. **Set Final Scores**: Adjust AI scores if needed 2. **Add Comments**: Explain your decisions 3. **Highlight Additional Issues**: Mark missed errors 4. **Provide Recommendations**: Suggest improvements ### Step 5: Submit Quality Assessment Review and submit your validated assessment. ## Evaluation Best Practices ### Maintaining Consistency 1. **Use Guidelines**: Follow established criteria strictly 2. **Calibrate Regularly**: Review example evaluations 3. **Document Standards**: Keep notes on edge cases 4. **Discuss with Peers**: Align understanding with other evaluators ### Being Fair and Objective - **Focus on Work, Not Person**: Evaluate the annotation, not the annotator - **Apply Standards Equally**: Same criteria for all annotations - **Consider Context**: Sentence difficulty, language pair complexity - **Avoid Bias**: Don't let previous annotations influence current review ### Providing Valuable Feedback **Effective Feedback Structure:** ``` ✓ STRENGTHS: - Correctly identified all major semantic errors - Clear and detailed error descriptions - Appropriate quality ratings for fluency ⚠ AREAS FOR IMPROVEMENT: - Missed minor punctuation issues (line 2, "example") - Could be more specific in correction suggestions - Consider the domain context for terminology choices 💡 SUGGESTIONS: - Review guidelines on minor syntax classification - Try providing alternative translations in comments - Great work overall - keep up the attention to detail! ``` ## Working with AI Assistance ### Understanding AI Capabilities **AI is good at:** - Pattern recognition for common errors - Objective grammar checking - Consistency in scoring - Processing large volumes **AI has limitations with:** - Cultural context and nuance - Domain-specific terminology - Creative or figurative language - Ambiguous cases requiring human judgment ### When to Override AI Trust your expertise and override AI when: - AI misses cultural nuances - Domain knowledge is needed - Context is misunderstood - Scores seem inconsistent with actual quality - Explanations are wrong or misleading ### Improving AI Performance Your feedback helps train the system: - **Be specific** about why you disagree - **Provide examples** of correct assessments - **Note patterns** in AI mistakes - **Report systematic issues** to administrators ## Interface Guide ### Annotation Review Interface **Main Panel:** - Split view: source and machine translation - Highlight overlay showing annotator's marks - AI suggestions panel (if enabled) **Sidebar:** - Annotation details - Annotator information - Quality ratings - Comments and voice recordings **Evaluation Form:** - Score inputs - Feedback text area - Submit/Cancel buttons ### Quality Assessment Interface **Main Panel:** - Source and translation display - AI quality scores - AI-detected errors - Confidence indicators **Sidebar:** - Validation controls (Confirm/Reject/Modify) - Your assessment form - Additional notes **Control Panel:** - Navigation between assessments - Filter and search options - Export functionality ### Keyboard Shortcuts | Shortcut | Action | |----------|--------| | `→` | Next evaluation | | `←` | Previous evaluation | | `Ctrl/Cmd + Enter` | Submit evaluation | | `Tab` | Move to next field | | `1-5` | Quick rating (when focused on score) | | `A` | Confirm AI suggestion | | `R` | Reject AI suggestion | ## Metrics and Performance ### Your Performance Indicators Track your evaluation quality: #### Agreement Rates - **Inter-evaluator Agreement**: Match with other evaluators - **AI Agreement Rate**: How often you confirm AI assessments - **Consistency Score**: Variation in your own judgments #### Productivity Metrics - **Evaluations Completed**: Total count - **Average Time per Evaluation**: Efficiency indicator - **Queue Processing Rate**: Workflow velocity #### Quality Indicators - **Feedback Quality**: How helpful your comments are - **Accuracy Recognition**: When your assessments match expert consensus - **Calibration Score**: Alignment with standards ### Improving Your Performance 1. **Review Feedback**: Learn from quality assurance reviews 2. **Study Disagreements**: Understand why others rated differently 3. **Attend Calibration Sessions**: Align with team standards 4. **Request Difficult Cases**: Challenge yourself to improve 5. **Track Trends**: Monitor your metrics over time ## Handling Special Cases ### Difficult Evaluations **Borderline Quality:** - Use half-points or detailed explanations - Document decision rationale - Consider requesting second opinion **Ambiguous Errors:** - Note the ambiguity in feedback - Explain multiple valid interpretations - Defer to guidelines when possible **Incomplete Annotations:** - Note what's missing - Provide guidance on completeness - Score based on what's present ### Disagreeing with Annotators When you significantly disagree: 1. **Double-check**: Ensure you're correct 2. **Explain Clearly**: Detailed reasoning in feedback 3. **Be Respectful**: Professional, constructive tone 4. **Provide Examples**: Show correct approach 5. **Flag for Review**: Escalate if needed ### AI Errors or Bugs If you encounter system issues: 1. **Document**: Screenshot and describe the problem 2. **Report**: Use the feedback/bug report feature 3. **Work Around**: Complete evaluation manually if possible 4. **Note in Comments**: Mention technical issue encountered ## FAQ for Evaluators **Q: How long should each evaluation take?** A: Typically 10-15 minutes, but varies with complexity. Quality over speed. **Q: What if I'm unsure about a rating?** A: Use the middle range (3) and explain your uncertainty in comments. Flag for a second review if needed. **Q: Should I always trust the AI assessment?** A: No. Use AI as a helpful tool, but apply your expert judgment. Override when necessary. **Q: How harsh should I be in evaluations?** A: Be fair and accurate. Focus on helping annotators improve, not being punitive. **Q: Can annotators see who evaluated their work?** A: This depends on system configuration. Ask your administrator. **Q: What if I find a major error in many annotations?** A: Report the pattern to administrators - may indicate need for additional annotator training. ## Tips for Success ### Effective Evaluation Strategies 1. **First Pass**: Quick overview to understand the annotation 2. **Detailed Review**: Careful examination of each highlighted error 3. **Comparison**: Check against your own expert assessment 4. **Feedback Composition**: Write clear, helpful comments 5. **Final Review**: Double-check scores and feedback before submitting ### Managing Workload - **Prioritize**: Start with oldest or most critical evaluations - **Batch Similar**: Group evaluations by language pair or annotator - **Take Breaks**: Prevent fatigue affecting judgment - **Set Daily Goals**: Maintain steady progress - **Balance**: Mix easy and difficult evaluations ### Continuous Improvement - **Learn from Feedback**: Apply quality assurance input - **Stay Current**: Review guideline updates - **Participate in Calibration**: Regular alignment sessions - **Share Knowledge**: Discuss challenging cases with peers - **Mentor Annotators**: Help improve overall quality ## Next Steps - Practice with sample evaluations - Review the [Features](features.md) documentation - Check the [FAQ](faq.md) for additional questions - Consult the [Technical Manual](../technical-manual/index.rst) for system details ```{seealso} For information on annotation standards, see the [Annotator Guide](annotator-guide.md). ```