Comparing the Evaluation Results of Processed and Reduced Datasets Using Excel
Since your dataset was reduced from all attributes to only two (Status & Target), you should compare the evaluation metrics (e.g., accuracy, precision, recall, F1-score) of the classification models before and after reduction.
1. Collect Evaluation Results from Weka
After running classification models on both datasets (full dataset and reduced dataset), record the following metrics:
- Accuracy (%)
- Precision (%)
- Recall (%)
- F1-Score (%)
- Cross-validation (k=10, k=20) results
- Percentage split (70:30, 80:20) results
2. Organizing Data in Excel
Step 1: Create a Table in Excel
Format your results into a table:
Dataset | Model | Accuracy (%) | Precision (%) | Recall (%) | F1-Score (%) |
---|---|---|---|---|---|
Full (Before Reduction) | J48 | 85 | 84 | 83 | 84 |
Full (Before Reduction) | Random Forest | 88 | 87 | 86 | 87 |
Reduced (After Reduction) | J48 | 75 | 74 | 73 | 74 |
Reduced (After Reduction) | Random Forest | 78 | 77 | 76 | 77 |
Step 2: Insert a Bar Chart
- Select the table data.
- Go to Insert → Charts → Bar Chart.
- Choose Clustered Column Chart (best for comparison).
- Add Data Labels:
- Click on bars → Right-click → Add Data Labels.
Step 3: Customize the Chart
- Legend: Differentiate between Before Reduction and After Reduction.
- Axis Labels:
- X-axis: Dataset & Model
- Y-axis: Accuracy/Precision/Recall/F1-Score (%)
- Title: Comparison of Classification Performance Before and After Feature Reduction.
Step 4: Analyze the Graph
- Compare Accuracy: Did it drop after reducing attributes?
- Compare Precision & Recall: Was there a significant performance loss?
- Discuss trade-offs:
- If accuracy dropped slightly, but computation speed improved, it might be acceptable.
- If performance dropped significantly, feature reduction was too aggressive.
Example Interpretation of Graph
- Before reduction: Higher accuracy (~85%-88%) due to more features.
- After reduction: Accuracy dropped (~75%-78%) because fewer features were available for classification.
- Trade-off: Model is now simpler and faster, but less accurate.
Conclusion
Your graph should visually show how dataset reduction impacts performance. If the drop is too significant, you might reconsider which attributes to retain.
Ulasan