Show simple item record

dc.contributor.authorKanervisto, Anssi,University of Eastern Finland
dc.date.accessioned2018-04-16T10:34:56Z
dc.date.available2018-04-16T10:34:56Z
dc.date.issued2016-07-16T14:02:59+00:00
dc.identifier.other10.5281/zenodo.56198en
dc.identifier.urihttps://erepo.uef.fi/handle/123456789/6372
dc.description.abstractA prebuilt dataset for OpenAI's task for image-2-latex system. Includes total of ~100k formulas and images splitted into train, validation and test sets. Formulas were parsed from LaTeX sources provided here: http://www.cs.cornell.edu/projects/kddcup/datasets.html(originally from  arXiv) Each image is a PNG image of fixed size. Formula is in black and rest of the image is transparent. For related tools (eg. tokenizer) check out this repository: https://github.com/Miffyli/im2latex-dataset For pre-made evaluation scripts and built im2latex system check this repository: https://github.com/harvardnlp/im2markup Newlines used in formulas_im2latex.lst are UNIX-style newlines (\n). Reading file with other type of newlines results to slightly wrong amount of lines (104563 instead of 103558), and thus breaks the structure used by this dataset. Python 3.x reads files using newlines of the running system by default, and to avoid this file must be opened with newlines="\n" (eg. open("formulas_im2latex.lst", newline="\n")).
dc.relation.urihttps://zenodo.org/record/56198
dc.subjectim2latex
dc.subjectlatex
dc.subjecttex
dc.subjectformula
dc.subjectopenai
dc.titleim2latex-100k , arXiv:1609.04938
dc.relation.doi10.5281/zenodo.56198


Files in this item

FilesSizeFormatView

There are no files associated with this item.

This item appears in the following Collection(s)

Show simple item record