Martin Potthast
Professor at the University of Kassel, https://hessian.AI, and https://ScaDS.AI. Member of @webis.de
Research in information retrieval #IR, natural language processing #NLP, and artificial intelligence.
- No cheating, repost the most recent picture of your pet(s)
- Reposted by Martin Potthast🧑🔬I’m recruiting PhD students in Natural Language Processing @unileipzig.bsky.social Computer Science, together with @scadsai.bsky.social! Topics include, but aren’t limited to: 🔎Linguistic Interpretability 🌍Multilingual Evaluation 📖Computational Typology Please share! #NLProc #NLP
- Reposted by Martin PotthastWe just released "German Commons", the largest openly-licensed German text dataset for LLM training: 154B tokens with clear usage rights for research and commercial use. huggingface.co/datasets/coral-nlp/german-commons
- Reposted by Martin Potthast🌟Really excited to share the fourth Strategic Workshop on Information Retrieval (SWIRL) report published in SIGIR Forum! Paper 👉🏻 www.johannetrippas.com/papers/tripp... More info 👉🏻 sites.google.com/view/swirl20... #SWIRL2025 #SIGIR2026 #IR #GenAI #Research #CHIIR2026
- Reposted by Martin PotthastThrilled to announce that Matti Wiegmann has successfully defended his PhD! 🎉🧑🎓 Huge congratulations on this incredible achievement! #PhDDefense #AcademicMilestone
- Reposted by Martin PotthastHonored to win the ICTIR Best Paper Honorable Mention Award for "Axioms for Retrieval-Augmented Generation"! Our new axioms are integrated with ir_axioms: github.com/webis-de/ir_... Nice to see axiomatic IR gaining momentum.
- Reposted by Martin PotthastWe presented two papers at ICTIR 2025 today: - Axioms for Retrieval-Augmented Generation webis.de/publications... - Learning Effective Representations for Retrieval Using Self-Distillation with Adaptive Relevance Margins webis.de/publications...
- Reposted by Martin PotthastWant to know how to make bi-encoders more than 3x faster with a new backbone encoder model? Check out our talk on the Token-Independent Text Encoder (TITE) #SIGIR2025 in the efficiency track. It pools vectors within the model to improve efficiency dl.acm.org/doi/10.1145/...
- Reposted by Martin PotthastNow @fschlatt.bsky.social presents "TITE: Token-Independent Text Encoder for Information Retrieval" at #SIGIR2025 Paper: webis.de/publications...
- Reposted by Martin PotthastHere are some impressions from our ReNeuIR workshop on "Reaching Efficiency in Neural IR" that we had yesterday at #SIGIR2025.
- Reposted by Martin PotthastHappy to share that our paper "The Viability of Crowdsourcing for RAG Evaluation" received the Best Paper Honourable Mention at #SIGIR2025! Very grateful to the community for recognizing our work on improving RAG evaluation. 📄 webis.de/publications...
- Reposted by Martin PotthastLukas Gienapp presents "The Viability of Crowdsourcing for RAG Evaluation" at #SIGIR2025 The paper is available at: webis.de/publications...
- Reposted by Martin Potthast@mrparryparry.bsky.social presenting our work on reproducing TREC DL 2019 judgements and the implications for evaluating modern ranking models on modern collections. Paper: arxiv.org/abs/2502.20937
- Reposted by Martin PotthastThank you Carlos for the shout-out of Lightning IR in the LSR tutorial at #SIGIR2025 If you want to fine your own LSR models, check out our framework at github.com/webis-de/lig...
- Reposted by Martin PotthastFrom July 13-17, 2025, @scadsai.bsky.social will join the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval in Padua, Italy. Our researchers have made the following contributions. Learn more about #SIGIR2025: 👉 sigir2025.dei.unipd.it
- Reposted by Martin PotthastDo not forget to participate in the #TREC2025 Tip-of-the-Tongue (ToT) Track :) The corpus and baselines (with run files) are now available and easily accessible via the ir_datasets API and the HuggingFace Datasets API. More details are available at: trec-tot.github.io/guidelines
- Reposted by Martin PotthastOur paper on self-distillation for training bi-encoders got accepted at #ICTIR2025! By exploiting pretrained encoder capabilities, our approach eliminates expensive teacher models and batch sampling while maintaining the same effectiveness.
- Reposted by Martin PotthastMost reporting on AI examines worst-case systems deployed under the guise of efficiency. But what would a good faith effort at Ethical AI look like? For two years, we’ve been looking over the shoulder of a city trying to do things differently.
- Reposted by Martin PotthastAll @acm.org publications will be 100% Open Access as of January 2026. When we announced this at POPL and CHI this year, conference participants spontaneously erupted in applause. The CS community is excited about ACM's move to OA!
- Reposted by Martin PotthastThe deadline for submissions to the ReNeuIR workshop at #SIGIR2025 is extended to June 10 😸 Details: reneuir.org #ReNeuIr2025 #SIGIR25
- Reposted by Martin PotthastPAN 2025 Call for Participation: Shared Tasks on Authorship Analysis, Computational Ethics, and Originality We'd like to invite you to participate in the following shared tasks at PAN 2025 held in conjunction with the CLEF conference in Madrid, Spain. Find out more at pan.webis.de/clef25/pan25...
- Reposted by Martin PotthastWe share your concern that LLMs could be prompted to generate responses that are biased in favor of certain products. That is why we are currently organizing a shared task on detecting advertisements in the responses of RAG-based search engines: bsky.app/profile/webi...
- Can LLM-generated ads be blocked? With OpenAI adding shopping options to ChatGPT, this question gains further importance. If you are interested in contributing to the research on LLM-based advertising, please check out our shared task: touche.webis.de/clef25/touch... More details below.
- The fourth edition of ReNeuIR @ #SIGIR2025 is back!! Check reneuir.org to see what we have in mind this year! Paper submission deadline: May 20, 2025.
- Reposted by Martin PotthastCan LLM-generated ads be blocked? With OpenAI adding shopping options to ChatGPT, this question gains further importance. If you are interested in contributing to the research on LLM-based advertising, please check out our shared task: touche.webis.de/clef25/touch... More details below.
- Reposted by Martin PotthastNew AI ethics scandal brewing... turns out a team at University of Zurich had dozens of undisclosed AI bot accounts debating with people on /r/ChangeMyView from November 2024 to March 2025 simonwillison.net/2025/Apr/26/...
- Reposted by Martin Potthast📢 The Internet Archive needs your help. At a time when information is being rewritten or erased online, a $700 million lawsuit from major record labels threatens to destroy the Wayback Machine. Tell the labels to drop the 78s lawsuit. 👉 Sign our open letter: www.change.org/p/defend-the... 🧵⬇️
- Reposted by Martin PotthastThe Workshop on Open Web Search at #ECIR2025 just starts with a keynote by @claclarke.bsky.social on Annotative Indexing. #WOWS25 #WOWS2025 #ECIR25
- Reposted by Martin Potthast
- Reposted by Martin PotthastHonored to receive the best short paper award and best paper honourable mention award at #ECIR2025. Thank you to all co-authors @maik-froebe.bsky.social, @hscells.bsky.social, Shengyao Zhuang, @bevankoopman.bsky.social, Guido Zuccon, Benno Stein, @martin-potthast.com, @matthias-hagen.bsky.social 🥳
- Reposted by Martin Potthast📢 Our paper "The Viability of Crowdsourcing for RAG Evaluation" has been accepted to #SIGIR2025 ! We compared how good humans and LLMs are at writing and judging RAG responses, assembling 1800+ responses across 3 styles, and 47K+ pairwise judgments in 7 quality dimensions. 🧵➡️
- Reposted by Martin Potthast1. For the past thirty years I've had the best job in the world. I've had the opportunity to follow my curiosity; explore the workings of nature and society; mentor students and junior colleagues in the same process; and teach generations of students about it all.
- Reposted by Martin PotthastImportant Dates ---------------------- now Training Data Released May 23, 2025 Software submission May 30, 2025 Participant paper submission June 27, 2025 Peer review notification July 07, 2025 Camera-ready participant papers submission Sep 09-12, 2025 Conference
- Reposted by Martin Potthast4. Generative Plagiarism Detection. Given a pair of documents, your task is to identify all contiguous maximal-length passages of reused text between them. pan.webis.de/clef25/pan25...
- Reposted by Martin Potthast3. Multi-Author Writing Style Analysis. Given a document, determine at which positions the author changes. pan.webis.de/clef25/pan25...
- Reposted by Martin Potthast2. Multilingual Text Detoxification. Given a toxic piece of text, re-write it in a non-toxic way while saving the main content as much as possible. pan.webis.de/clef25/pan25...
- Reposted by Martin Potthast1. Voight-Kampff Generative AI Detection. Subtask 1: Given a (potentially obfuscated) text, decide whether it was written by a human or an AI. Subtask 2: Given a document collaboratively authored by human and AI, classify the extent to which the model assisted. pan.webis.de/clef25/pan25...
- Reposted by Martin Potthast[Not loaded yet]
- Reposted by Martin PotthastInterested in joining our research group or do you know someone who might be interested? We have a new vacancy: Research position at the Webis group on Watermarking for Large Language Models. More information: webis.de/for-students...
- Reposted by Martin PotthastNearly there now - just a few hundred more days to go.
- Reposted by Martin Potthast2nd International Workshop on Open Web Search: CfP We invite you to the #ECIR2025 Workshop on Open Web Search #wows2025. Please consider to submit to the scientific track or the WOWS-Eval shared task to enrich the Open Web Index with relevance judgments. Details: opensearchfoundation.org/wows2025
- Reposted by Martin PotthastMy advisor warned me that academics trend towards bitterness. He encouraged me to intentionally resist this, remember where I came from, and never forget the privilege of getting to spend a life working with knowledge and ideas. He too said that bitterness and resentment is easy.
- Analyzing game boards ain't ChatGPT's thing, yet. (German conversation, alt texts in English) The game is "Mensch ärgere dich nicht", a dice game in which the aim is to move your own four pieces around the board without being thrown back to square 1 by the pieces of the other players ... 1/4
- ... who move to your own position(s). Despite being almost entirely a game of luck, it's surprisingly funny, known to be just as upsetting on occasion, and reasonably well balanced until the end. Here, yellow is in the lead. but GPT recognizes it only partially. 2/4
- Yellow won. GPT does not see it. Potentially, the perspective plays a role. 3/4
- Apparently, it can be made to second-guess itself quite easily. 4/4
- Reposted by Martin Potthast[Not loaded yet]
- Reposted by Martin PotthastBefore you all delete your accounts on X, you should consider deleting content but "donating" them to science. Many institutions, such as @gesis-dataservices.bsky.social might use them to scrape more effectively than via burner accounts.
- Reposted by Martin Potthast🐣 New release: small-text v2.0.0.dev1 With Small Language Models on the rise, the new version of small-text has been long overdue! Despite the generative AI hype, many real-world tasks still rely on supervised learning—which is reliant on labeled data. #activelearning #nlproc #nlp #llms
- Reposted by Martin Potthast
- Time for a starter pack on information retrieval: go.bsky.app/MXPJoTnat://did:plc:fr4mrqeybprbevl5eenagk5f/app.bsky.graph.starterpack/3lawqgkwp2z25
- Reposted by Martin PotthastHello, Computational linguistics/NLP world in Bluesky! We're creating the same accounts on other social media platforms in Bluesky! #NLProc
- Reposted by Martin PotthastToday we will present our poster on Query Variation Robustness of Transformer Models at #EMNLP2024. You can find us at the Information Retrieval and Text Mining 3 poster session at #EMNLP2024.
- Reposted by Martin Potthast@bsky.app is there a way to follow all the people someone is following with a click, or make a starter pack from them? Would be a very fast way to create big networks when onboarding.
- Tomorrow's introductory lecture on IR will be fun: We'll discuss examples of situations where retrieval systems succeed and fail. Here's a nice little example of news retrieval and how RAG systems fail at it. More research is needed if they are going to be used for any type of question.
- Reposted by Martin Potthast🔵News: ReNeuIR Workshop is back at #SIGIR2024! » Call for papers: reneuir.org/cfp.html » Shared task on efficient neural IR: reneuir.org/shared_task.... Come participate/present/network with a growing IR research sub-community excited about efficient neural retrieval.
- How will conversational search AI pay for itself? It may be native ads or product placement in generated answers. At #CHIIR2024 next week, we'll present a user study showing that many people don't recognize ads inserted by LLMs in generated search results: webis.de/publications... #mlsky
- Reposted by Martin PotthastHow does Wikipedia decide whether a scientist should be mentioned in an article that is not about them? 🤷 We call this the problem of "micro-notability", and we've studied how Wikipedia editors deal with it in two articles on CRISPR/Cas9: journals.sagepub.com/doi/10.1177/...