class: center, middle, inverse, title-slide .title[ # Comparing texts ] .subtitle[ ##
Data Science in a Box ] .author[ ###
datasciencebox.org
] --- layout: true <div class="my-footer"> <span> <a href="https://datasciencebox.org" target="_blank">datasciencebox.org</a> </span> </div> --- ## What is a document about? - Term frequency - Inverse document frequency `$$idf(\text{term}) = \ln{\left(\frac{n_{\text{documents}}}{n_{\text{documents containing term}}}\right)}$$` tf-idf is about comparing **documents** within a **collection**