SKILL.md

name:: hal-archive-api
description:: Access French and European research via the HAL open archive API
emoji:: 🇫🇷
category:: literature
subcategory:: fulltext
keywords:: ["HAL", "French research", "open archive", "CNRS", "European research", "institutional repository"]
source:: https://api.archives-ouvertes.fr/

HAL Open Archive API

Overview

HAL (Hyper Articles en Ligne) is France's national open archive for scholarly deposits. Managed by CNRS, it hosts 4M+ full-text documents from French research institutions and international collaborators. The API provides Solr-based search with full metadata, PDF links, and OAI-PMH harvesting. Free, no authentication required.

API Endpoints

Search API

# Keyword search
curl "https://api.archives-ouvertes.fr/search/?q=machine+learning&rows=20&wt=json"

# Search specific fields
curl "https://api.archives-ouvertes.fr/search/?q=title_s:\"deep learning\"&wt=json"

# Filter by document type
curl "https://api.archives-ouvertes.fr/search/?q=neural+networks&\
fq=docType_s:ART&rows=20&wt=json"

# Filter by year and language
curl "https://api.archives-ouvertes.fr/search/?q=climate+change&\
fq=producedDateY_i:[2023 TO 2026]&fq=language_s:en&wt=json"

# Filter by institution
curl "https://api.archives-ouvertes.fr/search/?q=robotics&\
fq=structId_i:441569&wt=json"

# Return specific fields
curl "https://api.archives-ouvertes.fr/search/?q=CRISPR&\
fl=halId_s,title_s,authFullName_s,producedDateY_i,uri_s,files_s&wt=json"

Search Fields

| Field | Description | Example | |-------|-------------|---------| | title_s | Title | title_s:"attention mechanism" | | authFullName_s | Author name | authFullName_s:"Yann LeCun" | | abstract_s | Abstract | abstract_s:transformer | | keyword_s | Keywords | keyword_s:"natural language" | | producedDateY_i | Year | producedDateY_i:2024 | | docType_s | Document type | docType_s:ART | | language_s | Language | language_s:en | | domain_s | Domain/subject | domain_s:info.info-ai | | journalTitle_s | Journal name | journalTitle_s:"Nature" | | structId_i | Institution ID | Lab/university ID |

Document Types

| Code | Type | |------|------| | ART | Journal article | | COMM | Conference paper | | THESE | PhD thesis | | HDR | Habilitation thesis | | REPORT | Report | | COUV | Book chapter | | OUV | Book | | POSTER | Poster | | UNDEFINED | Preprint/other |

Query Parameters

| Parameter | Description | |-----------|-------------| | q | Solr query | | fq | Filter query | | fl | Fields to return | | rows | Results per page (max 10000) | | start | Pagination offset | | sort | Sort order (e.g., producedDateY_i desc) | | wt | Format: json, xml, csv |

Response Structure

{
  "response": {
    "numFound": 12500,
    "start": 0,
    "docs": [
      {
        "halId_s": "hal-01234567",
        "title_s": ["Deep Learning for Climate Modeling"],
        "authFullName_s": ["Marie Dupont", "Jean Martin"],
        "producedDateY_i": 2024,
        "docType_s": "ART",
        "journalTitle_s": "Environmental Modelling",
        "uri_s": "https://hal.science/hal-01234567",
        "files_s": ["https://hal.science/hal-01234567/document"],
        "domain_s": ["sde.es", "info.info-ai"],
        "abstract_s": ["We propose a novel deep learning approach..."],
        "language_s": ["en"]
      }
    ]
  }
}

Python Usage

import requests

BASE_URL = "https://api.archives-ouvertes.fr/search/"


def search_hal(query: str, rows: int = 20,
               doc_type: str = None, from_year: int = None,
               language: str = None) -> list:
    """Search HAL open archive."""
    params = {
        "q": query,
        "wt": "json",
        "rows": rows,
        "fl": "halId_s,title_s,authFullName_s,producedDateY_i,"
              "uri_s,files_s,docType_s,journalTitle_s,abstract_s",
        "sort": "producedDateY_i desc",
    }

    fq = []
    if doc_type:
        fq.append(f"docType_s:{doc_type}")
    if from_year:
        fq.append(f"producedDateY_i:[{from_year} TO 2030]")
    if language:
        fq.append(f"language_s:{language}")
    if fq:
        params["fq"] = fq

    resp = requests.get(BASE_URL, params=params)
    resp.raise_for_status()
    data = resp.json()

    results = []
    for doc in data.get("response", {}).get("docs", []):
        title = doc.get("title_s", [""])[0] if isinstance(
            doc.get("title_s"), list) else doc.get("title_s", "")
        results.append({
            "hal_id": doc.get("halId_s"),
            "title": title,
            "authors": doc.get("authFullName_s", []),
            "year": doc.get("producedDateY_i"),
            "type": doc.get("docType_s"),
            "journal": doc.get("journalTitle_s"),
            "url": doc.get("uri_s"),
            "pdf": doc.get("files_s", [None])[0],
        })
    return results


def search_theses(topic: str, from_year: int = 2020) -> list:
    """Find French PhD theses on a topic."""
    return search_hal(topic, rows=50, doc_type="THESE",
                      from_year=from_year)


def get_institution_publications(struct_id: int,
                                 from_year: int = 2023) -> list:
    """Get publications from a specific institution."""
    params = {
        "q": "*:*",
        "fq": [f"structId_i:{struct_id}",
               f"producedDateY_i:[{from_year} TO 2030]"],
        "wt": "json",
        "rows": 100,
        "fl": "halId_s,title_s,authFullName_s,producedDateY_i,docType_s",
        "sort": "producedDateY_i desc",
    }
    resp = requests.get(BASE_URL, params=params)
    resp.raise_for_status()
    return resp.json().get("response", {}).get("docs", [])


# Example: find recent French AI research
papers = search_hal("intelligence artificielle", from_year=2024)
for p in papers:
    pdf = " [PDF]" if p["pdf"] else ""
    print(f"[{p['year']}] {p['title']}{pdf}")

# Example: find PhD theses on NLP
theses = search_theses("natural language processing")
for t in theses:
    print(f"{t['title']} — {', '.join(t['authors'][:2])}")

HAL Domains

| Code | Domain | |------|--------| | info | Computer Science | | math | Mathematics | | phys | Physics | | sde | Environmental Sciences | | sdv | Life Sciences | | shs | Social Sciences & Humanities | | chim | Chemistry | | spi | Engineering Sciences |

References

HAL Open Archive
HAL API Documentation
HAL Search Guide

SKILL.md

name:: hal-archive-api
description:: Access French and European research via the HAL open archive API
emoji:: 🇫🇷
category:: literature
subcategory:: fulltext
keywords:: ["HAL", "French research", "open archive", "CNRS", "European research", "institutional repository"]
source:: https://api.archives-ouvertes.fr/

HAL Open Archive API

Overview

API Endpoints

Search API

# Keyword search
curl "https://api.archives-ouvertes.fr/search/?q=machine+learning&rows=20&wt=json"

# Search specific fields
curl "https://api.archives-ouvertes.fr/search/?q=title_s:\"deep learning\"&wt=json"

# Filter by document type
curl "https://api.archives-ouvertes.fr/search/?q=neural+networks&\
fq=docType_s:ART&rows=20&wt=json"

# Filter by year and language
curl "https://api.archives-ouvertes.fr/search/?q=climate+change&\
fq=producedDateY_i:[2023 TO 2026]&fq=language_s:en&wt=json"

# Filter by institution
curl "https://api.archives-ouvertes.fr/search/?q=robotics&\
fq=structId_i:441569&wt=json"

# Return specific fields
curl "https://api.archives-ouvertes.fr/search/?q=CRISPR&\
fl=halId_s,title_s,authFullName_s,producedDateY_i,uri_s,files_s&wt=json"

Search Fields

Document Types

Query Parameters

Response Structure

{
  "response": {
    "numFound": 12500,
    "start": 0,
    "docs": [
      {
        "halId_s": "hal-01234567",
        "title_s": ["Deep Learning for Climate Modeling"],
        "authFullName_s": ["Marie Dupont", "Jean Martin"],
        "producedDateY_i": 2024,
        "docType_s": "ART",
        "journalTitle_s": "Environmental Modelling",
        "uri_s": "https://hal.science/hal-01234567",
        "files_s": ["https://hal.science/hal-01234567/document"],
        "domain_s": ["sde.es", "info.info-ai"],
        "abstract_s": ["We propose a novel deep learning approach..."],
        "language_s": ["en"]
      }
    ]
  }
}

Python Usage

import requests

BASE_URL = "https://api.archives-ouvertes.fr/search/"


def search_hal(query: str, rows: int = 20,
               doc_type: str = None, from_year: int = None,
               language: str = None) -> list:
    """Search HAL open archive."""
    params = {
        "q": query,
        "wt": "json",
        "rows": rows,
        "fl": "halId_s,title_s,authFullName_s,producedDateY_i,"
              "uri_s,files_s,docType_s,journalTitle_s,abstract_s",
        "sort": "producedDateY_i desc",
    }

    fq = []
    if doc_type:
        fq.append(f"docType_s:{doc_type}")
    if from_year:
        fq.append(f"producedDateY_i:[{from_year} TO 2030]")
    if language:
        fq.append(f"language_s:{language}")
    if fq:
        params["fq"] = fq

    resp = requests.get(BASE_URL, params=params)
    resp.raise_for_status()
    data = resp.json()

    results = []
    for doc in data.get("response", {}).get("docs", []):
        title = doc.get("title_s", [""])[0] if isinstance(
            doc.get("title_s"), list) else doc.get("title_s", "")
        results.append({
            "hal_id": doc.get("halId_s"),
            "title": title,
            "authors": doc.get("authFullName_s", []),
            "year": doc.get("producedDateY_i"),
            "type": doc.get("docType_s"),
            "journal": doc.get("journalTitle_s"),
            "url": doc.get("uri_s"),
            "pdf": doc.get("files_s", [None])[0],
        })
    return results


def search_theses(topic: str, from_year: int = 2020) -> list:
    """Find French PhD theses on a topic."""
    return search_hal(topic, rows=50, doc_type="THESE",
                      from_year=from_year)


def get_institution_publications(struct_id: int,
                                 from_year: int = 2023) -> list:
    """Get publications from a specific institution."""
    params = {
        "q": "*:*",
        "fq": [f"structId_i:{struct_id}",
               f"producedDateY_i:[{from_year} TO 2030]"],
        "wt": "json",
        "rows": 100,
        "fl": "halId_s,title_s,authFullName_s,producedDateY_i,docType_s",
        "sort": "producedDateY_i desc",
    }
    resp = requests.get(BASE_URL, params=params)
    resp.raise_for_status()
    return resp.json().get("response", {}).get("docs", [])


# Example: find recent French AI research
papers = search_hal("intelligence artificielle", from_year=2024)
for p in papers:
    pdf = " [PDF]" if p["pdf"] else ""
    print(f"[{p['year']}] {p['title']}{pdf}")

# Example: find PhD theses on NLP
theses = search_theses("natural language processing")
for t in theses:
    print(f"{t['title']} — {', '.join(t['authors'][:2])}")

HAL Domains

References

HAL Open Archive
HAL API Documentation
HAL Search Guide

Adoption

wentorai/hal-archive-api

$ install --global

Security Scan Results

SKILL.md

HAL Open Archive API

Overview

API Endpoints

Search API

Search Fields

Document Types

Query Parameters

Response Structure

Python Usage

HAL Domains

References

Related Skills

wentorai/thuthesis-guide

wentorai/thesis-writing-guide

wentorai/thesis-template-guide

wentorai/sjtuthesis-guide

wentorai/hal-archive-api

$ install --global

Security Scan Results

SKILL.md

HAL Open Archive API

Overview

API Endpoints

Search API

Search Fields

Document Types

Query Parameters

Response Structure

Python Usage

HAL Domains

References

Related Skills

wentorai/thuthesis-guide

wentorai/thesis-writing-guide

wentorai/thesis-template-guide

wentorai/sjtuthesis-guide