Back to Results
First PageMeta Content
Statistical classification / Natural language processing / Information retrieval / Data mining / Boilerplate / Search engine indexing / Random access machine / Classification rule / Classifier / Statistics / Information science / Science


Boilerplate Detection using Shallow Text Features Christian Kohlschütter, Peter Fankhauser, Wolfgang Nejdl L3S Research Center / Leibniz Universität Hannover Appelstr. 9a, 30167 Hannover Germany
Add to Reading List

Document Date: 2014-06-25 04:22:58


Open Document

File Size: 674,28 KB

Share Result on Facebook

City

Dublin / Computer Vision / Hannover / New York / Marrakech / New York City / /

Company

K. Vieira A. S. / Google / Link Density (P/C/N) Dim Precision / B. Ribeiro-Neto A. S. / Sixth International Language Resources / /

Country

Germany / South Africa / United States / Canada / Australia / United Kingdom / Morocco / India / /

Currency

USD / /

/

Event

Product Issues / Product Recall / /

Facility

National Institute of Standards and Technology / In Building / Lucene IR library / /

IndustryTerm

Web-scale solution / opinion mining / web page structure / web-page segmentation / web-page cleaning tool / Web corpus / Web content / Web intrapage informative structure mining / power-law / web page templates / Web Corpora / tree-based algorithms / Web browser / data mining / news search engine / Web page segments / Web service / web-page content identification / WEB PAGE FEATURES Feature Levels / opinion mining pipeline / web template content / web page template detection / Web Document Modeling / Web search engines / search precision / Web-as-Corpus / data management / Web information / web search / 1R algorithm / mining / web page segmentation / form factor devices / actual content Web pages / large web-derived corpus / search engine / /

MarketIndex

set 675 / /

Organization

European Language Resources Association / National Science Foundation / Wolfgang Nejdl L3S Research Center / National Institute of Standards and Technology / T PT / Federal Government / Leibniz Universität Hannover Appelstr / /

Person

Web / Wolfgang Nejdl / /

Position

simple and plausible stochastic model for describing the boilerplate creation process / author / Shannon random writer / simple random writer / representative / /

Product

Lucene IR / Hannover / Leaves / NumWords/LinkDensity / /

ProgrammingLanguage

HTML / /

ProvinceOrState

New York / /

PublishedMedium

the Google News / /

TVShow

Shannon / /

Technology

same algorithms / BOILERPLATE Algorithm / decision-tree-based algorithms / search engine / machine learning / HTML / Terms Algorithms / Knowledge Management / Automatic identification / Pasternack algorithm / data mining / BTE algorithm / DOM / document object model / 1R algorithm / evaluated algorithms / /

URL

http /

SocialTag