Given the analysis we did in previous parts, this one will show how the dataset for training was constructed according to previous analysis, so that we can combine risk behaviors, school scoring and school-to-retailer distances to predict whether a student would start smoking early using Random Forest model.
Geographically based, part II targets to demonstrate the spatial distribution of high schools and that of registered tobacco retailers. Overviews of high schools and tobacco retailers in NYC with respect to each borough will be presented in an interactive way. We will discuss the relation at the end of this part via geographical data analysis.
YRBSS provides all around records on teenagers’ risk behaviors, just as in our common sense: drinking alcohol, physical fights, exercise and diet, mental health, sexual behaviors… and of course, smoking. There are extensive possibilities to dig in, so you can easily get an idea how NYC high school students behave in terms of smoking: such as how many of them do they smoke, how often do they smoke and how the number of teenage smokers change with time.