> pretty insane it can parse and re-render charts in HTML
> it uses CLIP and SAM features concatenated, so better grounding
> very efficient per vision tokens/performance ratio
> covers 100 languages
Great work . Best part is interpretability and speed .. @tomaarsen - I am planning to fine tune a model for text to code retrieval with below setup .. please guide if this setting seems fine for start or anything I can tune to do better .. Idea is to do decent on text to code and eval on (https://github.com/CoIR-team/coir)
Training dataset - claudios/code_search_net .. filter on Python code .. query is doc string of code and passage is code ... loss - SparseMultipleNegativesRankingLoss.. not able to think of decent dev evaluation .. shall I use SparseTripletEvaluator .. also , just query and positive passage is fine because I believe negative options will be all other data in that batch or we have to explicitly prepare data ( mine negative data ) .. please guide ..