This is the main read me file which contains all the information about the countrywide car accident dataset, which covers 49 states of the USA. The accident data are collected from February 2016 to Dec 2021, using multiple APIs that provide streaming traffic incident (or event) data. I have done EDA on the same.
Currently, there are about 2.8 million accident records in this dataset.
Project - Kaggele - Countrywide Traffic Accident Dataset (2016 - 2021) EDA
-> pip install opendatasets –upgrade –quiet
-> import opendatasets as od – hit
**dataset_url = 'https://www.kaggle.com/datasets/sobhanmoosavi/us-accidents'**
**od.download('https://www.kaggle.com/datasets/sobhanmoosavi/us-accidents')**
#### Saving file as data_filename variable data_filename = ‘./us-accidents/US_Accidents_Dec21_updated.csv’
#### Information about below records df.tail()
cities_by_accident[:10].plot(kind ="barh")
#### USA Population by cities 2022
#### What are the most populated states vs cities in the United States? Here is a list of the top ten most populated states in the country:
California (Population: 39,613,493) <> City Los Angeles (Population: 3,971,883)(3.9 Million)
Texas (Population: 29,730,311) <> City: Houston (Population: 2,307,345)(2.9 Million), Dallas (Population : 1,308,814)(1.3 Milion)
Florida (Population: 21,944,577) <> City Miami (Population: 441,003)(4 Lakhs) <> Orlando (Population: 309,154)(3 Lakhs)
New York (Population: 19,299,981)
Pennsylvania (Population: 12,804,123)
NewYork being highest population there are no records. Miami has highest accidents however population is 4 lakhs as compare to Los Angeles population of 3Million in Califonia. Orlando(Population: 3Lakhs), Dallas(Texas), Houston(Texas) at second and 3rd place.
high_accident_cities = cities_by_accident[cities_by_accident >= 1000]
less_accident_cities = cities_by_accident[cities_by_accident < 1000]
len(high_accident_cities) is 496
len(high_accident_cities)/len(cities)
sns.distplot(high_accident_cities)
sns.distplot(less_accident_cities)
sns.histplot(cities_by_accident, log_scale = True)