DOM-based keyword extraction from web pages
Tiedosto(t)
Rinnakkaistallenteen versio
final draftPäivämäärä
2019Tekijä(t)
Shah, Himat
Rezaei, Mohammad
Fränti, Pasi
Yksilöllinen tunniste
10.1145/3371425.3371495Metadata
Näytä kaikki kuvailutiedotLisätietoa
Rinnakkaistallenne
Viittaus
Shah, Himat. Rezaei, Mohammad. Fränti, Pasi. (2019). DOM-based keyword extraction from web pages. AIIPCC '19: Proceedings of the International Conference on Artificial Intelligence, Information Processing and Cloud Computing, 62. 10.1145/3371425.3371495.Oikeudet
© Association for Computing Machinery
Tiivistelmä
We present D-rank, an unsupervised, language and domain independent method for automatically extracting keywords from a single web page. The method does not use any corpus, and relies only on the information and features on the web page including page URL, word frequency, title, hyperlinks, and headers, which are extracted from DOM tree of the page. Different scores are assigned to the words according to their importance that is specified by their positions in the web page. Experimental results on web pages in three different languages show the effectiveness of the proposed method.