skills/sciverse-academic-retrieval/SKILL.md
Citation-grade academic literature retrieval (search, semantic chunks, byte-range read, figure fetch) over Sciverse, an open scientific platform indexing peer-reviewed and preprint papers.
npx skillsauth add InternScience/scp sciverse-academic-retrievalInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Connects to the Sciverse SCP Server via the SCP Hub MCP gateway to perform citation-grade scientific literature retrieval over a corpus that includes peer-reviewed papers (Nature, Cell, …), preprints (arXiv, bioRxiv, …) and other academic sources.
The server exposes 5 tools designed for RAG by autonomous research agents:
structured metadata search, natural-language semantic chunk retrieval,
byte-range source-text reading, and figure/table image fetching — all returning
stable doc_id / chunk_id for reproducible citation.
import asyncio
import json
import base64
from mcp.client.streamable_http import streamablehttp_client
from mcp import ClientSession
class SciverseClient:
"""Sciverse SCP Server client (5 academic-retrieval tools).
All requests transparently proxied by the SCP Hub to the Sciverse backend.
Authentication uses the SCP-HUB-API-KEY header (your SCP Platform key).
"""
def __init__(self, server_url: str, api_key: str):
self.server_url = server_url
self.api_key = api_key
self.session = None
async def connect(self):
try:
self.transport = streamablehttp_client(
url=self.server_url,
headers={"SCP-HUB-API-KEY": self.api_key},
)
self.read, self.write, self.get_session_id = await self.transport.__aenter__()
self.session_ctx = ClientSession(self.read, self.write)
self.session = await self.session_ctx.__aenter__()
await self.session.initialize()
return True
except Exception as e:
print(f"[sciverse] connect failed: {e}")
return False
async def disconnect(self):
if self.session:
await self.session_ctx.__aexit__(None, None, None)
if hasattr(self, "transport"):
await self.transport.__aexit__(None, None, None)
def parse_text_result(self, result):
"""Extract concatenated text from a tool result's content blocks.
Works for: search_papers, semantic_search, read_content, list_catalog.
Returns: str (the tool's JSON payload as text).
"""
if isinstance(result, dict):
content_list = result.get("content") or []
else:
content_list = getattr(result, "content", []) or []
texts = []
for item in content_list:
if isinstance(item, dict):
if item.get("type") == "text":
texts.append(item.get("text") or "")
else:
if getattr(item, "type", None) == "text":
texts.append(getattr(item, "text", "") or "")
return "".join(texts)
def parse_image_result(self, result):
"""Extract a figure/table image (used by get_resource).
Returns: dict with keys 'mime_type' (e.g. 'image/png') and 'bytes'
(decoded binary). Returns None if the result is not an image.
"""
if isinstance(result, dict):
content_list = result.get("content") or []
else:
content_list = getattr(result, "content", []) or []
for item in content_list:
data = item.get("data") if isinstance(item, dict) else getattr(item, "data", None)
mime = item.get("mimeType") if isinstance(item, dict) else getattr(item, "mimeType", None)
type_ = item.get("type") if isinstance(item, dict) else getattr(item, "type", None)
if type_ == "image" and data:
return {"mime_type": mime, "bytes": base64.b64decode(data)}
return None
SERVER_URL = "https://scp.intern-ai.org.cn/api/v1/mcp/43/Sciverse"
API_KEY = "<YOUR_SCP_HUB_API_KEY>"
async def main():
client = SciverseClient(SERVER_URL, API_KEY)
if not await client.connect():
print("connect failed")
return
try:
# 1. Structured search: recent transformer papers
result = await client.session.call_tool(
"search_papers",
arguments={
"query": "transformer attention", # BM25 over title/abstract/journal
"year_from": 2023,
"page_size": 5,
},
)
papers = json.loads(client.parse_text_result(result))
print(f"search_papers hits: {len(papers.get('hits', []))}")
# 2. Semantic search: RAG-style chunk retrieval
result = await client.session.call_tool(
"semantic_search",
arguments={"query": "How does transformer attention work?", "top_k": 3},
)
chunks = json.loads(client.parse_text_result(result))
for hit in chunks.get("hits", []):
print(f" - {hit['title']} (score={hit['score']:.3f}, doc_id={hit['doc_id']})")
# 3. Read content: expand context around a known offset
if chunks.get("hits"):
first = chunks["hits"][0]
result = await client.session.call_tool(
"read_content",
arguments={"doc_id": first["doc_id"], "offset": first["offset"], "limit": 4096},
)
text_window = json.loads(client.parse_text_result(result))
print(f"read_content next_offset={text_window.get('next_offset')} more={text_window.get('more')}")
# 4. List catalog: discover available filter fields and operators
result = await client.session.call_tool(
"list_catalog", arguments={"include_sample_values": False},
)
catalog = json.loads(client.parse_text_result(result))
print(f"available filter fields: {len(catalog.get('fields', []))}")
# 5. Get resource: fetch a figure referenced inside read_content's Markdown
# (Only call after read_content returned a Markdown snippet with .)
# result = await client.session.call_tool(
# "get_resource", arguments={"file_name": "figures/fig-3.png"},
# )
# image = client.parse_image_result(result)
# if image:
# from pathlib import Path
# Path("fig-3.png").write_bytes(image["bytes"])
finally:
await client.disconnect()
asyncio.run(main())
search_papersStructured metadata search by author, journal, year, subject, etc. Use when
the user knows specific filter values ("Hinton's papers from 2020-2023",
"Nature papers on CRISPR"). Do not use for free-text Q&A — that's
semantic_search.
query (str, optional) — BM25 keyword over title/abstract/journaltitle_contains (str, optional) — Substring match on titleabstract_contains (str, optional) — Substring match on abstractauthors (list[str], optional) — Any of these authors matchesyear_from / year_to (int, optional) — Publication year range (inclusive)journals (list[str], optional) — Journal names (any match)subjects (list[str], optional) — Subject classification (e.g. "biology")sort_by_year (str, default "desc") — desc / asc / nonepage (int, default 1), page_size (int, default 10, max 50)filters_advanced (list, optional) — Escape hatch with full operator set
(FILTER_OP_EQ, IN, CONTAINS, GTE, LTE, …) for fields not surfaced above{hits: [...], total: int} where each hit has
doc_id, title, author, abstract, publication_venue_name,
publication_published_year.semantic_searchNatural-language semantic search returning relevant paper chunks for
RAG-style answering. Use for free-text questions ("How does attention
work?"). Typical chain: semantic_search → pick chunk → read_content.
query (str, required) — Natural-language query, 1-200 words optimaltop_k (int, default 10, max 30)source_types (list[str], optional) — Filter by web / pdfmode (str, default "balanced") — fast (~200ms keyword only) /
balanced (~600ms hybrid) / quality (~2-4s LLM-rewrite + hybrid){hits: [...]} where each hit has
chunk_id, doc_id, chunk (the matched text), score, title,
offset (byte offset into source doc — pass to read_content for expansion).read_contentRead a UTF-8 byte range of a paper's source text. Typically called with a
doc_id/offset returned by semantic_search to expand context (read more
bytes before or after a chunk for fuller answers).
doc_id (str, required) — Paper ID from search_papers / semantic_searchoffset (int, default 0) — Byte offset to start readinglimit (int, default 4096, max 16384) — Bytes to read{text: str, bytes_returned: int, next_offset: int, more: bool}.
Markdown text may contain figure references like  — pass
file_name to get_resource to fetch the image.get_resourceFetch the binary bytes of a paper figure / table image referenced inside
read_content's Markdown. Use when the user asks to see / describe a figure
and read_content output contains an image reference.
file_name (str, required) — Relative path from the Markdown .
Must not contain .. or start with /.data (base64) + mimeType (image/*).
Multimodal agents (Claude, GPT-4V, Gemini, …) can read it directly.list_catalogReturns the schema catalog for search_papers: every field name, type,
whether it's filterable / sortable / default-returned, human description, and
applicable filter operators. Use when constructing precise search_papers
filters or facing an ambiguous field need.
include_sample_values (bool, default false) — If true, also fetch
top-20 values for enum-like fields (24h cached, ~100s of ms first call).{fields: [...]} where each field has name, type
(string/integer/list[string]/…), filterable, sortable,
default_return, description, applicable_operators, and optionally
sample_values.doc_id.semantic_search → read_content to assemble citation-grounded
summaries with stable doc_id references for verifiability.testing
Assess wind energy potential and perform site analysis using atmospheric science calculations.
tools
Scientific Literature Mining - Mine scientific literature: PubMed search, arXiv search, web search, and Tavily deep search. Use this skill for scientific informatics tasks involving pubmed search search literature search web tavily search. Combines 4 tools from 2 SCP server(s).
tools
Virus Genomics Analysis - Analyze virus genomics: NCBI virus dataset, annotation, taxonomy, and literature search. Use this skill for virology tasks involving get virus dataset report get virus annotation report get taxonomy search literature. Combines 4 tools from 2 SCP server(s).
tools
Virtual Screening Pipeline - Virtual screening: search PubChem by substructure, compute similarity, filter by drug-likeness, and predict binding affinity. Use this skill for drug discovery tasks involving search pubchem by smiles calculate smiles similarity calculate mol drug chemistry boltz binding affinity. Combines 4 tools from 3 SCP server(s).