Skip to content

rivillasm/MidTermProject

Repository files navigation

Mid-Term Project

This repository cointains all the information you need to work on the Mid-Term project.

Hello and Welcome!!!

The goal is to predict arrival delays of commercial flights. Often, there isn't much airlines can do to avoid the delays, therefore, they play an important role in both profits and loss of the airlines. It is critical for airlines to estimate flight delays as accurate as possible because the results can be applied to both, improvements in customer satisfaction and income of airline agencies.

Files

  • exploratory_analysis.ipynb: this file contains 10 questions we need to answer during the data exploration phase. They will help us to understand the data and become familiar with different variables.
  • modeling.ipynb: this file contains instructions for modeling part of the project. We recommend to split modeling tasks into more notebooks.
  • data_description.md: when you need to look for any information regarding specific attributes in the data this is the file to look in.
  • sample_submission.csv: this file is the example of how the submission of the results should look like.

Data

We will be working with data from air travel industry. We will have four separate tables:

  1. flights: The departure and arrival information about flights in US in years 2018 and 2019.
  2. fuel_comsumption: The fuel comsumption of different airlines from years 2015-2019 aggregated per month.
  3. passengers: The passenger totals on different routes from years 2015-2019 aggregated per month.
  4. flights_test: The departure and arrival information about flights in US in January 2020. This table will be used for evaluation. For submission, we are required to predict delays on flights from first 7 days of 2020 (1st of January - 7th of January). We can find sample submission in file sample_submission.csv

The data are stored in the Postgres database. You can see the information about the hostname and credentials in Compass.

How to Start

You should spend some time with the datasets on your own. Try to look for interesting relationships and information inside the tables. Once you are familiar with the data and information in there you can move on to the EDA tasks from us that will further help you to get familiar with the datasets. Afterward, you can move onto the data preparation and modeling steps. Make sure you have enough time to prepare you slides and presentation at the end.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published