To procedure prolonged context prompts correctly, versions need robust recall abilities. The 'Needle Within a Haystack' (NIAH) evaluation actions a model's power to accurately remember info from the large corpus of data. We Improved the robustness of this benchmark by utilizing amongst thirty random needle/question pairs for each prompt and testing