dc.contributor.advisor | Protopapas, Pavlos | |
dc.contributor.advisor | Wang, Hongming | |
dc.contributor.author | Marsh, Tanner | |
dc.date.accessioned | 2024-05-18T12:02:17Z | |
dash.embargo.terms | 2025-05-17 | |
dc.date.created | 2024 | |
dc.date.issued | 2024-05-17 | |
dc.date.submitted | 2024 | |
dc.identifier.citation | Marsh, Tanner. 2024. Natural Language Search for NASA ADS. Master's thesis, Harvard University Division of Continuing Education. | |
dc.identifier.other | 31294003 | |
dc.identifier.uri | https://nrs.harvard.edu/URN-3:HUL.INSTREPOS:37378607 | * |
dc.description.abstract | The NASA Astrophysics Data System (ADS) is a critical resource for researchers and students in astronomy, astrophysics, and beyond. ADS indexes a vast collection of papers and scholarly literature that researchers can search through using the ADS website or API. ADS’s database is powered by Apache Solr, enabling users to formulate highly expressive and precise search queries from the more than 50 allowable search fields. However, the sophistication of ADS’s search capabilities comes at the cost of usability, necessitating users to familiarize themselves with Solr and ADS’s documentation to fully exploit its features. This thesis proposes a solution to enhance the accessibility of ADS by creating a chat application where users make requests for papers by asking for them in natural language rather than by constructing Solr queries. This application works by leveraging SOTA transformer-based large language models (LLMs) to translate natural language requests into Solr queries, thereby simplifying user interaction with the ADS database without compromising on the precision of search results. In this work, we use in-context learning (ICL) with retrieval augmented generation (RAG) in order to enhance the translation capabilities of the LLM, leading to significant improvement in translation performance. | |
dc.format.mimetype | application/pdf | |
dc.language.iso | en | |
dash.license | LAA | |
dc.subject | Astrophysics Data System (ADS) | |
dc.subject | few-shot learning | |
dc.subject | in-context learning (ICL) | |
dc.subject | retrieval-augmented generation (RAG) | |
dc.subject | Solr | |
dc.subject | text-to-sql | |
dc.subject | Computer science | |
dc.subject | Artificial intelligence | |
dc.subject | Astronomy | |
dc.title | Natural Language Search for NASA ADS | |
dc.type | Thesis or Dissertation | |
dash.depositing.author | Marsh, Tanner | |
dash.embargo.until | 2025-05-17 | |
dc.date.available | 2024-05-18T12:02:17Z | |
thesis.degree.date | 2024 | |
thesis.degree.grantor | Harvard University Division of Continuing Education | |
thesis.degree.level | Masters | |
thesis.degree.name | ALM | |
dc.type.material | text | |
thesis.degree.department | Extension Studies | |
dash.author.email | tannerjmarsh@gmail.com | |