class: center, middle, inverse, title-slide # Welcome to Data Science
🙌 --- layout: true <div class="my-footer"> <span> <a href="https://datasciencebox.org/" target="_blank">datasciencebox.org </a> </span> </div> --- class: center, middle # Hello world! --- ## What is data science? - <i class="fa fa-database fa"></i> + <i class="fa fa-flask fa"></i> = data science? -- - <i class="fa fa-database fa"></i> + <i class="fa fa-code fa"></i> = data science? -- - <i class="fa fa-database fa"></i> + <i class="fa fa-user fa"></i> + <i class="fa fa-code fa"></i> = data science? -- - <i class="fa fa-database fa"></i> + <i class="fa fa-users fa"></i> + <i class="fa fa-code fa"></i> = data science? -- <br> Data science is an exciting discipline that allows you to turn raw data into understanding, insight, and knowledge. We're going to learn to do this in a `tidy` way -- more on that later! --- # What is this course? This is a course on introduction to data science, with an emphasis on statistical thinking. -- **Q - What data science background does this course assume?** A - None. -- **Q - Is this an intro stat course?** A - While statistics `\(\ne\)` data science, they are very closely related and have tremendous of overlap. Hence, this course is a great way to get started with statistics. However this course is **not** your typical high school statistics course. -- **Q - Will we be doing computing?** A - Yes. --- # What is this course? **Q - Is this an intro CS course?** A - No, but many themes are shared. -- **Q - What computing language will we learn?** A - R. -- **Q: Why not language X?** A: We can discuss that over ☕. --- ## Where is this course? <br><br><br><br><br><br><br> .large[ .center[ [datasciencebox.org](https://datasciencebox.org/) ] ] --- class: center, middle # Data in the wild --- # Data from wearables **A year as told by fitbit** .pull-left[ by Nick Strayer http://livefreeordichotomize.com/2017/12/27/a-year-as-told-by-fitbit/ ] .pull-right[ ![A year as told by fitbit](img/nick-strayer-fitbit.png) ] --- # R-Ladies chapters **R-Ladies global tour** .pull-left[ by Maelle Salmon http://www.masalmon.eu/2017/10/06/globalrladiestour/ ] .pull-right[ ![R Ladies Global Tour](img/maelle-salmon-rladies.png) ] --- # Trump's tweets **Text analysis of Trump's tweets confirms he writes only the (angrier) Android half** .pull-left[ by David Robinson (Stack Overflow) http://varianceexplained.org/r/trump-tweets/ ] .pull-right[ ![Trump tweets](img/david-robinson-trump-tweets.png) ] --- ## Greatest Twitter scheme of all time <img src="img/bohemian-rhapsody.png" width="381" style="display: block; margin: auto;" /> .small[ https://gist.github.com/mine-cetinkaya-rundel/03d7516dea1e5f2613a5d71c28edb08d ] --- ## Create a GitHub account .instructions[ Go to [github.com](https://github.com/), and create an account (unless you already have one). ] Tips for selecting a username:<sup>✦</sup> .midi[ - Incorporate your actual name. - Reuse username from other contexts, e.g., Twitter or Slack. - Pick a username you'll be comfortable revealing to your future boss. - Shorter is better than longer. - Be as unique as possible in as few characters as possible. - Make it timeless. Don’t highlight your current university, employer, etc. - Avoid words laden with special meaning in programming, like `NA`. ] .instructions[ Once done, place a green sticky on your laptop. If you have questions, place a pink sticky. ] .footnote[ <sup>✦</sup> Source: [Happy git with R](http://happygitwithr.com/github-acct.html#username-advice) by Jenny Bryan ] --- ## Join RStudio Cloud .instructions[ Go to [URL], and log in with your GitHub credentials. ] .instructions[ Once done, place a green sticky on your laptop. If you have questions, place a pink sticky. ] --- ## <i class="fas fa-laptop"></i> `AE 01 - UN Votes` Create your first data visualization! - Log on to RStudio Cloud and click on this course's workspace. - Make a copy of the project for this application exercise and launch it. - In the Files pane in the bottom right corner, spot the file called `unvotes.Rmd`. Open it, and then click on the "Knit" button. - Go back to the file and change your name on top (in the `yaml` -- we'll talk about what this means later) and knit again. - Then, change the country names to those you're interested in. Your spelling and capitalization should match how the countries appear in the data, so take a peek at the Appendix to confirm spelling. Knit again. And voila, your first data visualization! .instructions[ Once done, place a green sticky on your laptop. If you have questions, place a pink sticky. ] --- class: center, middle # Course structure and policies --- ## Class meetings - Interactive - Some lectures, lots of learn-by-doing - Bring your laptop to class every day --- ## Diversity & Inclusiveness: .midi[ **Intent:** Students from all diverse backgrounds and perspectives be well-served by this course, that students' learning needs be addressed both in and out of class, and that the diversity that the students bring to this class be viewed as a resource, strength and benefit. It is my intent to present materials and activities that are respectful of diversity: gender identity, sexuality, disability, age, socioeconomic status, ethnicity, race, nationality, religion, and culture. Let me know ways to improve the effectiveness of the course for you personally, or for other students or student groups. ] -- - If you have a name and/or set of pronouns that differ from those that appear in your official Duke records, please let me know. - If you feel your performance is being impacted by your experiences outside of class, please don't hesitate to come and talk with me. If you prefer to speak with someone outside of the course, your academic dean is an excellent resource. - I (like many people) am still in the process of learning about diverse perspectives/identities. If something was said in class (by anyone) that made you feel uncomfortable, please talk to me about it. --- ## How to get help - Course content, logistics, etc. discussion on course Slack <i class="fab fa-slack-hash"></i>. - Please post on the #questions channel instead of direct messaging. - Use proper formatting: When asking questions involving code, please make sure to use inline code formatting for short bits of code or code snippets for longer, multi-line chunks. - Formatting messages: https://get.slack.help/hc/en-us/articles/202288908-Format-your-messages - Code snippets: https://get.slack.help/hc/en-us/articles/204145658-Creating-a-Snippet - Often it's a lot more pleasant an experience to get your questions answered in person. Make use of the teaching team's office hours, we're here to help! - For personal and grade related questions, direct message me on Slack or use email. --- ## Tips for asking questions - First search existing discussion for answers. If the question has already been answered, you're done! If it has already been asked but you're not satisfied with the answer, add to the thread. - Give your question context from course concepts not couse assignments. - Good context: "I have a question on filtering" - Bad context: "I have a question on HW 1 question 4" - Be precise in your description: - Good description: "I am getting the following error and I'm not sure how to resolve it - `Error: could not find function "ggplot"`" - Bad description: "R giving errors, help me! Aaaarrrrrgh!” - You can edit a question after posting it. --- ## Tips for asking questions - Format your questions nicely using markdown and code formatting. - Where appropriate, provide links to specific files, or even lines within them, in the body of your issue. This will help your helper understand your question. Note that only the teaching team will have access to private repos. - (Optional) Tag someone or some group of people. Start by typing the @ symbol and Slack will generate some suggestions. --- ## Academic integrity - Only work that is clearly assigned as team work can be completed collaboratively. - Use of disallowed materials during the take home exam will not be tolerated. --- ## Sharing/reusing code - I am well aware that a huge volume of code is available on the web to solve any number of problems. - Unless I explicitly tell you not to use something the course's policy is that you may make use of any online resources (e.g. StackOverflow) but you must explicitly cite where you obtained any code you directly use (or use as inspiration). - Any recycled code that is discovered and is not explicitly cited will be treated as plagiarism. - On individual assignments you may not directly share code with another student in this class, and on team assignments you may not directly share code with another team in this class. - Except for the take home exams, you are welcome to discuss the problems together and ask for advice, but you may not send or make use of code from another team. - On the take home exams all communication with classmates is explicitly forbidden. --- ## Course components: - Teams: 3-4 person teams, initially based on survey and pretest results, will change throughout the semester - Application exercises: Usually start in class and finish in teams by the next class period, check/no check - Homework: Individual, lowest score dropped - Exams: Individual, two take home midterms - Final project: Team, presentations during scheduled final exam time, you must participate in the project and be in class to present to pass this class - Self paced tutorials: Individual, check/no check --- ## Grading - Weights of each component are given in the syllabus. - Class attendance is a firm expectation; frequent absences or tardiness will be considered a legitimate cause for grade reduction. - Cumulative numerical averages of 90 - 100 are guaranteed at least an A-, 80 - 89 at least a B-, and 70 - 79 at least a C-, however the exact ranges for letter grades will be determined after the final exam. - The more evidence there is that the class has mastered the material, the more generous the curve will be. --- ## Other policies - Please refrain from texting or using your computer for anything other than coursework during class. - You must be in class on a day when you're scheduled to present, there are no make ups for presentations. --- ## Join Slack - Go to [URL] and join course Slack. - Download the Slack app and keep it open.