• Our interaction with large language models (LLMs) is becoming increasingly important as they are integrated into various aspects of our lives. These models are remarkable for their zero-shot capabilities in natural language tasks, but they can also generate nonsensical or unfaithful content, known as hallucinations, which raises concerns about their trustworthiness. In a recent research article, a team of researchers from the Department of Computer Science at ETH Zurich, consisting of Niels Mündler, Jingxuan He, Slobodan Jenko, and Martin Vechev, presents an approach to address self-contradictory hallucinations in large LMs through evaluation, detection, and mitigation.

  • Unlocking the full potential of large language models (LLMs) for text rewriting becomes tangible with the introduction of RewriteLM, a powerful and innovative language model designed specifically to tackle complex text rewriting challenges. Developed by Lei Shu, ˚liangchen Luo, Jayakumar Hoskere, Yun Zhu, Canoee Liu, Simon Tong, Jindong Chen, and Lei Meng, this state-of-the-art model aims to mitigate the limitations of traditional LLMs by improving control and reducing unintended content generation in text rewriting tasks.

  • A recent research article reveals an interesting similarity between human language processing and artificial intelligence language models: both display a phonological bias, specifically a preference for consonants over vowels when identifying words. Lead researcher Juan Manuel Toro and the author from the Institució Catalana de Recerca i Estudis Avançats (ICREA) analyzed the language processing patterns of OpenAI’s ChatGPT and discovered that it mimics the human’s consonant bias across different languages.

  • A recent research article presents a groundbreaking approach called dynamic context pruning, which improves the efficiency and interpretability of autoregressive Transformers in large language models (LLMs). The research, conducted by Sotiris Anagnostidis, Dario Pavllo, Luca Biggio, Lorenzo Noci, Aurelien Lucchi, Thomas Hoffmann, Eth Zürich, Csem Ml, and Sa from the University of Basel, highlights how their approach can prune up to 80% of the context without significant performance degradation, leading to better memory and computational efficiency.

  • A recent research article presents a groundbreaking multimodal language model called ChatBridge, which has the incredible ability to connect various modalities using language as a catalyst. Developed by Zijia Zhao, Longteng Guo, Tongtian Yue, Sihan Chen, Shuai Shao, Xinxin Zhu, Zehuan Yuan, and Jing Liu from the Institute of Automation, Chinese Academy of Sciences, and Bytedance Inc., the model shows excellent promise towards significant advances in multimodal artificial intelligence research.

  • Researchers from AI for Science Institute Beijing China, Renmin University of China Libraries, and the School of Information at Renmin University of China have developed a book recommendation system called BookGPT using large language models (LLMs) like ChatGPT, showcasing promising results in various book recommendation tasks. The study opens new opportunities in applying AI for book recommendations and the library and information science field.

  • As AI-generated code becomes more prevalent in software development, concerns over legal and ethical challenges grow. A recent study by researchers from Seoul National University and NAVER AI Lab presents an innovative watermarking method called Selective WatErmarking via Entropy Thresholding (SWEET) that improves both the quality of watermarked code and the detection of machine-generated code.

  • In a recent research article by Jaemin Cho, Abhay Zala, and Mohit Bansal from UNC Chapel Hill, they introduce two novel visual programming frameworks, VPGEN and VPEVAL, designed to improve text-to-image (T2I) generation and evaluation. By breaking down the T2I generation process into manageable and interpretable steps, these frameworks could revolutionize how we understand and analyze AI-generated images.

  • Researchers from DAMO Academy Alibaba Group and Nanyang Technological University have developed a novel framework that combines the strengths of large language models (LLMs) and Python solvers to effectively tackle intricate temporal reasoning problems. The framework demonstrates significant improvements in performance on temporal question-answering benchmarks, showcasing its ability to address complex time-bound problems more accurately.

  • A recent research article sheds light on how large language models (LLMs) process and store information related to arithmetic reasoning. Authored by Alessandro Stolfo, Yonatan Belinkov, and Mrinmaya Sachan, the paper presents a mechanistic interpretation of LLMs for answering arithmetic-based questions using a causal mediation analysis framework. This groundbreaking approach offers fresh insights into the specific components of LLMs involved in arithmetic reasoning, opening up new possibilities for future research and AI capabilities.

  • Recent research by Guhao Feng, Yuntian Gu, Bohang Zhang, and Haotian Ye from Peking University has revealed the incredible potential of using Chain-of-Thought (CoT) prompting in Large Language Models (LLMs) for improving their reasoning and mathematical capabilities. The study provides valuable insights into the importance of CoT and its future applications in AI.

  • A recent research article, titled “ToMChallenges: A Principle-Guided Dataset and Diverse Evaluation Tasks for Exploring Theory of Mind” proposes a dataset and evaluation tasks that test the ability of large language models (LLMs) to perform Theory of Mind (ToM) tasks. The authors, who are researchers from The Graduate Center CUNY, Toyota Technological Institute at Chicago, and Basque Center on Cognition, argue that their results show the need for improvement in the consistency of AI models when it comes to ToM tasks.

  • In a recent research article, a team of researchers from the University of Edinburgh discovered that Large Language Models (LLMs) fail to recognize identifier swaps in Python code generation tasks. As the model size increases, the models tend to become more confident in their incorrect predictions. This surprising finding goes against the commonly observed trend of increased prediction quality with increasing model size, and raises questions on LLMs’ true understanding of content and their applicability in tasks that deviate from their training data.

  • A recent research article titled Testing the General Deductive Reasoning Capacity of Large Language Models Using OOD Examples reveals that large language models (LLMs) exhibit robust generalization abilities to longer, wider, and compositional proofs. Conducted by researchers affiliating with New York University and Google, the study systematically measures LLMs’ general deductive reasoning skills to gain insights into their generalization capabilities.

  • A recent research article titled “STAR: Boosting Low-Resource Event Extraction by Structure-to-Text Data Generation with Large Language Models” has showcased a cutting-edge method to significantly improve low-resource event extraction, utilizing large language models for synthetic data generation. The method, known as STAR, is authored by researchers from the Department of Computer Science and the Department of Anthropology at the University of California, Los Angeles.

  • A recent research article titled Sentiment Analysis in the Era of Large Language Models: A Reality Check investigates the capability of large language models (LLMs) like ChatGPT in performing sentiment analysis tasks. The authors, affiliated with DAMO Academy Alibaba Group, University of Illinois at Chicago, and Nanyang Technological University Singapore, found that these LLMs demonstrated satisfactory performance in simpler tasks but lagged behind in more complex tasks requiring deeper understanding or structured sentiment information. The study also proposes a novel benchmark, SENTIEVAL, for a more comprehensive and realistic evaluation of LLMs for sentiment analysis.

  • Researchers from the National Taiwan University have recently proposed a novel framework called SELF-ICL that enables zero-shot in-context learning (ICL) by generating its own demonstrations instead of relying on existing training data. This groundbreaking method bridges the gap between large language models and real-world situations, showing potential in improving zero-shot performance.

  • Researchers from The University of Tokyo have created SciReviewGen, a large-scale dataset that paves the way for automatic literature review generation using natural language processing and advanced models.

  • A recent study by Evangelos Pournaras from the School of Computing University of Leeds alerts the scientific community to the ethical challenges and impacts AI language models, like ChatGPT, bring to science and research. This blog highlights the findings of the study, suggesting recommendations for research ethics boards to establish a more responsible research conduct with AI language models.

  • In a recent research article, a team of experts at MIT Computer Science and Artificial Intelligence Lab and CUHK Centre for Perceptual and Interactive Intelligence proposed a new approach called search-augmented instruction learning (SAIL) that enhances language model’s performance. The researchers fine-tuned the LLaMA-7B model on a novel search-grounded training set, allowing the SAIL model to overcome limitations in transparency and obsolescence faced by conventional language models. Let’s dive into how the SAIL model can revolutionize the field of artificial intelligence.

  • In a recent study, researchers from Shanghai Jiao Tong University, Hong Kong Polytechnic University, Beijing University of Posts and Telecommunications, and other institutions have developed a groundbreaking method, RefGPT, for generating truthful and customized dialogues using large language models, like GPT-3.5 and GPT-4. This new approach can efficiently create enormous dialogue datasets with minimal hallucination and added detailed controls to achieve high customization capabilities.

  • Recent research titled “Reasoning with Language Model is Planning with World Model” by a team of researchers from the University of Florida and Mohamed bin Zayed University of Artificial Intelligence showcases their revolutionary method of incorporating strategic planning capabilities into large language models (LLMs). This improves the reasoning proficiency of LLMs and can bring AI capabilities closer to how humans think and plan.

  • Researchers from the University of California Santa Barbara have developed a novel approach called LLM-PO that improves Large Language Models’ (LLMs) ability to handle interactive tasks without the need for gradient access or extensive demonstrations. This breakthrough has the potential to enhance the performance of state-of-the-art models like GPT-4 in solving complex tasks that require interaction and reasoning.

  • A recent research article presents an innovative method, Para-Ref, which utilizes large language models (LLMs) to paraphrase a single reference into multiple high-quality diverse expressions in a way that improves the correlation with human evaluation for several automatic evaluation metrics. The researchers contributing to this paper are affiliated with institutions such as Renmin University of China, The Chinese University of Hong Kong, ETH Zürich, and Microsoft Research Asia China.

  • Traditional language models hold a wealth of knowledge about our world, but their static nature can limit their ability to adapt to new information. A recent research article from Stanford University researchers Nathan Hu, Eric Mitchell, Christopher D Manning, and Chelsea Finn presents a new method—Context-aware Meta-learned Loss Scaling (CaMeLS)—that allows large language models to learn from new data streams more effectively. This approach represents a remarkable improvement over existing online fine-tuning techniques and opens up new possibilities for the future of AI.

  • Researchers at University of Massachusetts Amherst and WS AI Labs have made valuable progress in the field of task-oriented semantic parsing by enhancing the performance of large language models (LLMs) through in-context learning and mitigating constraint violations. Their study focuses on improving how these models translate natural language into machine-interpretable programs that adhere to API specifications, paving the way for novel advancements in AI capabilities.

  • Have you ever wondered if it’s possible to pinpoint whether a piece of text was written by a human or generated by a large language model like OpenAI’s GPT-3? Researchers Kangxi Wu, Liang Pang, Huawei Shen, Xueqi Cheng and Tat-Seng Chua from the Institute of Computing Technology Chinese Academy of Sciences and the Sea-NExT Joint Lab National University of Singapore have proposed an efficient, secure, and scalable detection tool called LLMDet to tackle this challenge. The method calculates the proxy perplexity of text by using the prior information of the model’s next-token probabilities, which are obtained during pre-training, making it both fast and secure.

  • A team of researchers from Peking University have developed a large language model, Lawyer LLaMA, specifically designed for the legal domain. The model aims to overcome challenges that existing models face in understanding and applying legal knowledge to address practical issues. By fine-tuning the model with legal domain data and integrating a retrieval module to generate more reliable responses, the team has significantly improved the model’s performance in the legal domain.

  • A group of researchers has made a significant advancement in harnessing the power of Large Language Models (LLMs) to improve the personalization of recommender systems. These researchers, affiliated with several organizations, have recently published an article titled “Large Language Models for User Interest Journeys” that introduces a framework for personalized extraction of interest journeys by using LLMs to summarize those journeys. This innovation allows recommendation platforms to better understand and cater to individual user interests.

  • Researchers from the Consumer Health Research Team have recently been exploring how Large Language Models (LLMs) can be utilized in making meaningful inferences on health-related tasks, focusing on grounding these models using physiological and behavioral time-series data. The central finding of their latest article demonstrates that few-shot tuning of LLMs can indeed be used in health applications, such as cardiac signal analysis, physical activity recognition, metabolic calculation, and mental health screening.

  • A recent research article has explored the incredible potential of large language models (LLMs) in table-to-text generation tasks. The study, conducted by a team of researchers from Yale University and the Technical University of Munich, focused on understanding how LLMs can generate natural language statements from structured data, such as tables. This breakthrough could pave the way for more advanced and efficient table-to-text generation systems, revolutionizing how we access and comprehend complex tabular data.

  • A recent research article from DAMO Academy Alibaba Group has explored whether GPT-4, a large language model (LLM), is capable of performing data analysis on par with professional human data analysts. The study conducted a series of head-to-head comparative experiments to measure GPT-4’s performance against that of human data analysts, utilizing a framework designed to prompt GPT-4 to perform end-to-end data analysis tasks.

  • A recent research article proposes Inference-Time Policy Adapters (IPA), a resource-efficient method for tailoring large language models (LLMs) without the need for fine-tuning. The IPA framework can improve performance in tasks like text generation, dialogue safety control, and reducing toxicity without relying on costly fine-tuning processes. The authors of this research are affiliated with the Allen Institute for Artificial Intelligence, University of Washington, and University of Southern California.

  • A recent research article explores the development of HuatuoGPT, a cutting-edge language model designed specifically for medical consultation. The authors, who are all affiliated with the Shenzhen Research Institute of Big Data and The Chinese University of Hong Kong Shenzhen, have successfully combined real-world doctor data with pre-existing language models such as ChatGPT to create a solution that surpasses previous models in most cases. HuatuoGPT holds promise for the future of AI-driven healthcare, paving the way for equitable access to high-quality medical care.

  • Advancements in Large Language Models (LLMs) have led to impressive performance on various reasoning tasks. However, researchers Daman Arora and Himanshu Gaurav Singh have introduced a new, more challenging benchmark - JEEBench - designed to test the problem-solving abilities of LLMs. Their evaluation shows promising results for GPT-4 but also highlights areas for improvement.

  • Researchers from Georgia Institute of Technology and Monash University have developed an exceptional model named LOGICLLAMA. It harnesses the power of Large Language Models (LLMs) to translate natural language statements into first-order logic rules, surpassing the performance of GPT-3.5! This comes with methodology improvements and potential implications on future artificial intelligence research in logical reasoning.

  • Recent research by Jiayan Guo, Lun Du, and Hengyu Liu goes beyond the conventions of natural language processing with large language models like ChatGPT and delves into the realm of understanding graph-structured data. With graph data pervading numerous fields like social network analysis, bioinformatics, and recommender systems, the authors investigate how well these models perform on a diverse range of structural and semantic-related tasks involving graph data.

  • Advancements in artificial intelligence have brought us closer to building smarter, more capable systems. To keep up with this progress, a team of researchers from Microsoft Research and U.C. Berkeley has developed Gorilla, a finetuned model that significantly outperforms GPT-4 in writing accurate API calls while also adapting to test-time document changes. This breakthrough demonstrates the model’s ability to minimize hallucination issues, ultimately improving the reliability and applicability of large language models (LLMs) in various tasks.

  • Research from the University of California Berkeley Computer Science Division introduces a state-of-the-art system, Ghostbuster, which detects AI-generated texts with high accuracy. With the recent advancements in large language models (LLMs) like ChatGPT, it has become increasingly challenging to distinguish human-written text from AI-generated content. Ghostbuster aims to address this problem, outperforming previous approaches and offering datasets for detection benchmarks.

  • Researchers from the University of Toronto have discovered an improved method of generating synthetic data with optimal faithfulness across multiple domains in a recent research article titled “Generating Faithful Synthetic Data with Large Language Models: A Case Study in Computational Social Science”. This technique improves upon conventional large language model-generated data, which often lacks topical or stylistic authenticity, by implementing strategies like grounding, filtering, and taxonomy-based generation.

  • In this fascinating research article, scientists from the University of Toronto and Vector Institute propose new algorithms that enable large language models (LLMs) to learn from prompts while protecting the privacy of sensitive data involved. This cutting-edge research paves the way for the continuing development of AI technologies without compromising on user privacy.

  • A recently published research article introduces a framework for event semantic processing, evaluating how large language models (LLMs) understand, reason, and make predictions about events. The team of authors from Peking University proposes a new benchmark, EVEVAL, accompanied by noteworthy findings that indicate a need for further evaluation in this area.

  • Imagine enhancing large language models (LLMs) with a more effective approach that is simple yet competitive with state-of-the-art methods. This is precisely what researchers from Microsoft Research Asia and Tsinghua University have achieved with their new method, Iterative Retrieval-Generation Synergy (ITER-RETGEN). In a recent study, the authors demonstrated that their ITER-RETGEN method yields up to 8.6% absolute gains on four out of six datasets for complex tasks like multi-hop question answering, fact verification, and commonsense reasoning. Their work signifies a significant leap forward in the architecture and capabilities of retrieval-augmented language models.

  • In a recent study by researchers at the National Key Laboratory for Novel Software Technology Nanjing University China and ByteDance, the translation ability of Large-scale Pretrained Language Models (LLMs) was explored, leading to the discovery that their potential was even greater than previously thought. Through a method called Multilingual Finetuning with Translation Instructions (mFTI), the researchers were able to improve translation performance for various languages compared to existing approaches.

  • In a recent research article, a team of researchers from Korea University and NAVER AI Lab have proposed an innovative neural architecture named Contrastive Reading Model (Cream) to enhance AI’s understanding of text-rich images. Cream is designed to push Large Language Models (LLMs) into the visual domain by capturing intricate details within images and bridging the gap between language and vision understanding. The rigorous evaluations presented in the article showcase Cream’s state-of-the-art performance in the visually-situated language understanding tasks, providing insights into future possibilities in the AI and computer vision world.

  • In recent years, the demand for effective sentence representation learning has been on the rise, as it plays a crucial role in various AI tasks. A new research article sheds light on an innovative framework called SynCSE, focusing on contrastive learning of sentence embeddings entirely from scratch. Authored by Junlei Zhang and Zhenzhong Lan from Zhejiang University, and Junxian He from Shanghai Jiao Tong University, the article delves deep into the implications of this novel approach and how it significantly outperforms existing unsupervised techniques while achieving results comparable to supervised models.

  • Researchers from the Faculty of Informatics at the Masaryk University have been making strides in enriching chain-of-thoughts datasets requiring arithmetical reasoning with the integration of nonparametric components, such as calculators. In their recent experiment, researchers have developed a machine-processable HTML-like format to enable more efficient integration between large language models (LLMs) and symbolic systems to improve arithmetical reasoning capabilities in AI.

  • The world of artificial intelligence has witnessed a groundbreaking improvement in strict zero-shot hierarchical classification. Researchers from the Department of Electrical and Computer Engineering at Queen’s University, Canada, and Rakuten Institute of Technology, USA have proposed a new framework that significantly enhances the performance of large language models in a strict zero-shot classification setting. The article, titled “A Simple and Effective Framework for Strict Zero-Shot Hierarchical Classification,” showcases the effective use of entailment-contradiction prediction in conjunction with large language models.

  • In a recent research article titled “A RELENTLESS Benchmark for Modelling Graded Relations between Named Entities,” the authors from Cardiff NLP at Cardiff University investigate how Large Language Models (LLMs) can rank entity pairs based on how well they satisfy given relations. The researchers propose a new benchmark to evaluate the models’ ability to grasp the complex, graded relationships that exist between named entities – a breakthrough that could have broad implications for AI in fields such as financial NLP and music recommendation systems.

  • Researchers from the University of Massachusetts Amherst have developed an innovative dyadic zero-shot event extraction (EE) approach to identify actions between actor pairs. This technique outperforms existing methods by addressing challenges such as word sense ambiguity, modality mismatch, and low efficiency. The new fine-grained, multistage generative question-answer method performs well on the Automatic Content Extraction dataset and requires significantly fewer queries compared to previous approaches.