top of page

Results

  • Experiment 1: How the size of the training set impacts the performance of trained model?

 

     For this experiment, we randomly generated three datasets of varying size from the training set.

    

     

 

​

​

​

 

​

      We experimented with  a variety of classifiers including Naïve Bayes, Bayes Net, kNN (k=1, k=3, k=5), decision tree, and OneR. We used 10-fold cross validation, and compared the validation precision and recall from datasets of varying size.     

​

      The precision for on-time prediction increases as the dataset gets larger for all classifiers.

​

 

 

​

​

​

​

​

​

​

 

 

 

 

 

 

      On the other hand, the recall for delayed flights decreases as the dataset gets larger for almost all classifiers, except for decision tree.

     

​

​

​

​

​

​

​

​

​

​

​

​

​

​

 

      We also conducted the same experiment with cost-sensitive classifiers. We found similar trends for precision, but opposite one for recall.

      Again, the precision for on-time prediction increases as the dataset gets larger for all classifiers.

​

​

​

     

​

​

​

​

​

​

​

​

​

 

      Interestingly, the recall for delayed flights also increases as the dataset gets larger for almost all classifiers, except for 5NN.

​

​

​

​

​

 

 

 

 

​

​

​

 

  • Conclusion for experiment 1:

      In both cases (with and without weighted cost matrix), the precision for on-time prediction increases as the dataset gets larger for all classifiers.

      However, in the case without weighted cost matrix, the recall for delayed flights decreases as the dataset gets larger for almost all classifiers (except for decision tree). In contrast, the recall for delayed flights keeps increasing for almost all cost-sensitive classifiers.

​

​

  • Experiment 2: Performances of cost-sensitive classifiers versus regular classifiers

​

      The cost-sensitive classifiers (except for ZeroR) achieved higher precision for on-time predictions as shown below: 

  

 

 

​

​

​

​

​

​

​

 

 

 

 

 

      Similarly, the cost-sensitive classifiers achieved higher recall for delayed flights: 

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

 

  • Conclusion for experiment 2:

       Our results showed that introducing the weighted cost matrix increases both the recall for delayed flights and the precision for on-time predictions for all classifiers (except for ZeroR in the cost-sensitive case, because cost-sensitive ZeroR predicted all flights as delayed).

​

bottom of page