dc.contributor.author | Shah, Himat | |
dc.contributor.author | Rezaei, Mohammad | |
dc.contributor.author | Fränti, Pasi | |
dc.contributor.editor | Tavares, João Manuel RS | |
dc.date.accessioned | 2020-02-10T11:44:26Z | |
dc.date.available | 2020-02-10T11:44:26Z | |
dc.date.issued | 2019 | |
dc.identifier.uri | https://erepo.uef.fi/handle/123456789/8021 | |
dc.description.abstract | We present D-rank, an unsupervised, language and domain independent method for automatically extracting keywords from a single web page. The method does not use any corpus, and relies only on the information and features on the web page including page URL, word frequency, title, hyperlinks, and headers, which are extracted from DOM tree of the page. Different scores are assigned to the words according to their importance that is specified by their positions in the web page. Experimental results on web pages in three different languages show the effectiveness of the proposed method. | |
dc.language.iso | englanti | |
dc.publisher | ACM Press | |
dc.relation.ispartof | AIIPCC '19: Proceedings of the International Conference on Artificial Intelligence, Information Processing and Cloud Computing | |
dc.relation.uri | http://dx.doi.org/10.1145/3371425.3371495 | |
dc.rights | In copyright 1.0 | |
dc.title | DOM-based keyword extraction from web pages | |
dc.description.version | final draft | |
dc.contributor.department | School of Computing, activities | |
uef.solecris.id | 67745320 | en |
dc.type.publication | Artikkelit ja abstraktit tieteellisissä konferenssijulkaisuissa | |
dc.relation.doi | 10.1145/3371425.3371495 | |
dc.description.reviewstatus | peerReviewed | |
dc.relation.articlenumber | 62 | |
dc.relation.isbn | 978-1-4503-7633-4 | |
dc.rights.accesslevel | openAccess | |
dc.type.okm | A4 | |
uef.solecris.openaccess | Ei | |
dc.rights.copyright | © Association for Computing Machinery | |
dc.type.displayType | article | en |
dc.type.displayType | artikkeli | fi |
dc.rights.url | https://rightsstatements.org/page/InC/1.0/ | |