diff --git a/1.5.3/.DS_Store b/1.5.3/.DS_Store new file mode 100644 index 0000000..56f2a9b Binary files /dev/null and b/1.5.3/.DS_Store differ diff --git a/1.5.3/index.html b/1.5.3/index.html index 65f027a..4aa14b1 100644 --- a/1.5.3/index.html +++ b/1.5.3/index.html @@ -672,7 +672,7 @@

What is MEGAnno?

  • Seamlessly incorporate both human and LLM data labels with verification workflows and integration to popular LLMs. This enables LLM agents to label data first while humans focus on verifying a subset of potentially problematic LLM labels.
  • Figure 1. MEGAnno's unique capabilities -
    Figure 1. MEGAnno unique capabilities

    +
    Figure 1. MEGAnno's unique capabilities

    System Overview

    MEGAnno provides two key components: (1) a Python client library featuring interactive widgets and (2) a back-end service consisting of web API and database servers. To use our system, a user can interact with a Jupyter Notebook that has the MEGAnno client installed. Through programmatic interfaces and UI widgets, the client communicates with the service. Figure 2. Overview of MEGAnno+ system. diff --git a/1.5.3/search/search_index.json b/1.5.3/search/search_index.json index b012599..3985a5c 100644 --- a/1.5.3/search/search_index.json +++ b/1.5.3/search/search_index.json @@ -1 +1 @@ -{"config":{"lang":["en"],"separator":"[\\s\\-]+","pipeline":["stopWordFilter"]},"docs":[{"location":"","title":"Welcome to MEGAnno documentation","text":""},{"location":"#how-to-get-started","title":"How to get started?","text":"

    There are 2 ways to get started with MEGAnno:

    1. Demo system access: We prepared a Google Colab notebook for this demo. To run the Colab notebook, you\u2019ll need a Google account, an OpenAI API key, and a MEGAnno access token (you can get this by filling out the request form).

    2. Your own MEGAnno environment: To set up MEGAnno for your own projects, you can set up your own self-hosted MEGAnno service. Please follow the self-hosted installation instructions.

    "},{"location":"#what-is-meganno","title":"What is MEGAnno?","text":"

    Many existing data annotation tools focus on the annotator enabling them to annotate data and manage annotation activities. Instead, MEGAnno is an open-source data annotation tool that puts the data scientist first, enabling you to bootstrap annotation tasks and manage the continual evolution of annotations through the machine learning lifecycle.

    In addition, MEGAnno\u2019s unique capabilities include:

    Figure 1. MEGAnno unique capabilities

    "},{"location":"#system-overview","title":"System Overview","text":"

    MEGAnno provides two key components: (1) a Python client library featuring interactive widgets and (2) a back-end service consisting of web API and database servers. To use our system, a user can interact with a Jupyter Notebook that has the MEGAnno client installed. Through programmatic interfaces and UI widgets, the client communicates with the service. Figure 2. Overview of MEGAnno+ system.

    Please see the Getting Started page for setup instructions and the Advanced Features page for more cool features we provide.

    "},{"location":"#references","title":"References","text":"

    @inproceedings{kim-etal-2024-meganno,\n    title = \"{MEGA}nno+: A Human-{LLM} Collaborative Annotation System\",\n    author = \"Kim, Hannah and Mitra, Kushan and Li Chen, Rafael and Rahman, Sajjadur and Zhang, Dan\",\n    editor = \"Aletras, Nikolaos and De Clercq, Orphee\",\n    booktitle = \"Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations\",\n    month = mar,\n    year = \"2024\",\n    address = \"St. Julians, Malta\",\n    publisher = \"Association for Computational Linguistics\",\n    url = \"https://aclanthology.org/2024.eacl-demo.18\",\n    pages = \"168--176\",\n}\n
    @inproceedings{zhang-etal-2022-meganno,\n    title = \"{MEGA}nno: Exploratory Labeling for {NLP} in Computational Notebooks\",\n    author = \"Zhang, Dan and Kim, Hannah and Li Chen, Rafael and Kandogan, Eser and Hruschka, Estevam\",\n    editor = \"Dragut, Eduard and Li, Yunyao and Popa, Lucian and Vucetic, Slobodan and Srivastava, Shashank\",\n    booktitle = \"Proceedings of the Fourth Workshop on Data Science with Human-in-the-Loop (Language Advances)\",\n    month = dec,\n    year = \"2022\",\n    address = \"Abu Dhabi, United Arab Emirates (Hybrid)\",\n    publisher = \"Association for Computational Linguistics\",\n    url = \"https://aclanthology.org/2022.dash-1.1\",\n    pages = \"1--7\",\n}\n

    "},{"location":"advanced/","title":"Advanced features","text":"

    This notebook provides examples of some of the advanced features.

    "},{"location":"advanced/#updating-schema","title":"Updating Schema","text":"

    Annotation requirements can change as projects evolve. To update the schema for a project, simply call set_schemas with the new schema object. For example, to expand the schema we set in the basic notebook:

    demo.get_schemas().set_schemas({\n    \"label_schema\": [\n        {\n            \"name\": \"sentiment\",\n            \"level\": \"record\", \n            \"options\": [\n                { \"value\": \"pos\", \"text\": \"positive\" },\n                { \"value\": \"neg\", \"text\": \"negative\" },\n                { \"value\": \"neu\", \"text\": \"neutral\" } # adding a new option\n            ]\n        },\n        # adding a span-level label\n                {\n            \"name\": \"sp\",\n            \"level\": \"span\", \n            \"options\": [\n                { \"value\": \"pos\", \"text\": \"positive\" },\n                { \"value\": \"neg\", \"text\": \"negative\" },\n            ]\n        }\n    ]\n})\n
    Only the latest schema will be active, but all previous ones will be preserved. To see the full history:
    demo.get_schemas().get_history()\n

    "},{"location":"advanced/#metadata","title":"Metadata","text":"

    In MEGAnno, metadata refers to auxiliary information associated with data records. MEGAnno takes user-defined functions to generate metadata and uses it to find important subsets and assist human annotators. Here we show two examples.

    Example 1: Adding sentence bert embeddings for data records. The embeddings can later be used to make similarity computations over records.

    # Example 1, adding sentence-bert embedding.\nfrom sentence_transformers import SentenceTransformer\nmodel = SentenceTransformer(\"all-MiniLM-L6-v2\")\n# set metadata generation function \ndemo.set_metadata(\"bert-embedding\",lambda x: list(model.encode(x).astype(float)), 500)\n

    Example 2: Extracting hashtags as annotation context.

    # user defined function to extract hashtag\ndef extract_hashtags(text):\n    hashtag_list = []\n    for word in text.split():\n        if word[0] == \"#\":\n            hashtag_list.append(word[:])\n    # widget can render markdown text\n    return \"\".join([\"- {}\\n\".format(x) for x in hashtag_list])\n\n# apply metadata to the project\ndemo.set_metadata(\"hashtag\", lambda x: extract_hashtags(x), 500)\n

    With hashtag metadata, MEGAnno widget can show it as context at annotation time.

    s1= demo.search(keyword=\"\", limit=50, skip=0, meta_names=[\"hashtag\"])\ns1.show()\n

    "},{"location":"advanced/#advanced-subset-generation","title":"Advanced Subset Generation","text":"

    In addition to exact keyword matches, MEGAnno also provides more advanced approaches of generating subsets.

    "},{"location":"advanced/#regex-based-searches","title":"Regex-based Searches","text":"

    MEGAnno supports searches based on regular expressions:

    s2_reg= demo.search(regex=\".* (delay) .*\", limit=50, skip=0)\ns2_reg.show({\"view\": \"table\"})\n

    "},{"location":"advanced/#subset-suggestion","title":"Subset Suggestion","text":"

    Searches initiated by users can help them explore the dataset in a controlled way. Still, the quality of searches is only as good as users\u2019 knowledge about the data and domain. MEGAnno provides an automated subset suggestion engine to assist with exploration. Embedding-based suggestions make suggestions based on data-embedding vectors provided by the user (as metadata).

    For example, suggest_similar suggests neighbors (based on distance in the embedding space) of data in the querying subset:

    s3 = demo.search(keyword=\"delay\", limit=3, skip=0) # source subset\ns4 = s3.suggest_similar(\"bert-embedding\", limit=4) # needs to provide a valid meta_name\ns4.show()\n
    "},{"location":"advanced/#subset-operations","title":"Subset Operations","text":"

    MEGAnno supports set operations to build more subsets from others:

    # intersection\ns_intersection = s1 & s2 # or s1.intersection(s2)\n# union\ns_union = s1 | s2 # or s1.union(s2)\n# difference\ns_diff = s1 - s2 # or s1.difference(s2)\n

    "},{"location":"advanced/#dashboard-administrator-only","title":"Dashboard (administrator-only)","text":"

    MEGAnno provides a built-in visual monitoring dashboard to help users to get real-time status of the annotation project. As projects evolve, users would often need to understand the project\u2019s status to make decisions about the next steps, like collecting more data points with certain characteristics or adding a new class to the task definition. To aid such analysis, the dashboard widget packs common statistics and analytical visualizations (e.g., annotation progress, distribution of labels, annotator agreement, etc.) based on a survey of our pilot users.

    To bring up the project dashboard:

    demo.show()\n

    Other features

    "},{"location":"basic/","title":"Basic Usages","text":"

    Please also refer to this notebook for a running example of the basic pipeline of using MEGAnno in a notebook.

    "},{"location":"basic/#setting-schema","title":"Setting Schema","text":"

    Schema defines the annotation task. Example of setting schema for a sentiment analysis task with positive and negative options.

    demo.get_schemas().set_schemas({\n    \"label_schema\": [\n        {\n            \"name\": \"sentiment\",\n            \"level\": \"record\", \n            \"options\": [\n                { \"value\": \"pos\", \"text\": \"positive\" },\n                { \"value\": \"neg\", \"text\": \"negative\" },\n            ]\n        }\n    ]\n})\ndemo.get_schemas().value(active=True)       \n
    A label can be defined to have level record or span. Record-level labels correspond to the entire data record, while span-level labels are associated with a text span in the record. See Updating Schema for an example of a more complex schema.

    "},{"location":"basic/#importing-data","title":"Importing Data","text":"

    Given a pandas dataframe like this (example generated from this Twitter US Airline Sentiment dataset):

    id tweet 0 @united how else would I know it was denied? 1 @JetBlue my SIL bought tix for us to NYC. We were told at the gate that her cc was declined. Supervisor accused us of illegal activity. 2 @JetBlue dispatcher keeps yelling and hung up on me!

    Importing data is easy by providing column names for id which is a unique importing identifier for data records, and content which is the raw text field.

    demo.import_data_df(df, column_mapping={\n    \"id\": \"id\",\n    \"content\": \"tweet\"\n})\n
    "},{"location":"basic/#exploratory-labeling","title":"Exploratory Labeling","text":"

    Not all data points are equally important for downstream models and applications. There are often cases where users might want to prioritize a particular batch (e.g., to achieve better class or domain coverage or focus on the data points that the downstream model cannot predict well). MEGAnno provides a flexible and controllable way of organizing annotation projects through the exploratory labeling. This annotation process is done by first identifying an interesting subset and assigning labels to data in the subset. We provide a set of \u201cpower tools\u201d to help identify valuable subsets.

    The script below shows an example of searching for data records with keyword \"delay\" and bringing up a widget for annotation in the next cell. More examples here.

    # search results => subset s1\ns1 = demo.search(keyword=\"delay\", limit=10, skip=0)\n# bring up a widget \ns1.show()\n

    "},{"location":"basic/#column-filters","title":"Column Filters","text":"

    To view all column filters, click on \"Filters\" button; to reset all column filters, click on \"Reset filters\" button.

    "},{"location":"basic/#column-order-visibility","title":"Column Order & Visibility","text":"

    1. To re-order and re-size column, mouse over column drag handler (left grip handler for re-order and right column edge for re-size). 2. To toggle column visiblity, click on \"Columns\", then toggle column to show/hide. 3. To reset column ordering and visibility, click on \"Reset columns\" button.

    "},{"location":"basic/#metadata-focus-view","title":"Metadata Focus-view","text":"

    To focus on a single metadata value, click on \"Settings\" button, then choose a metadata name from the list.

    "},{"location":"basic/#exporting","title":"Exporting","text":"

    Although iterations can happen within a single notebook, it's easy to export the data, and annotations collected:

    # collecting the annotation generated by all annotators\ndemo.export()\n
    "},{"location":"llm_integration/","title":"LLM Integration","text":"

    This notebook provides an example workflow of utilizing LLMs as annotation agents within MEGAnno.

    Figure 1. Human-LLM collaborative workflow.

    MEGAnno offers a simple human-LLM collaborative annotation workflow: LLM annotation followed by human verification. Put simply, LLM agents label data first (Figure 1, step \u2460), and humans verify LLM labels as needed. For most tasks and datasets one can use LLM labels as is; for some subset of difficult or uncertain instances (Figure 1, step \u2461), humans can verify LLM labels \u2013 confirm the right ones and correct the wrong ones (Figure 1, step \u2462). In this way, the LLM annotation part can be automated, and human efforts can be directed to where they are most needed to improve the quality of final labels.

    An overview of the entire system and key concepts are shown below.

    Figure 2. Overview of MEGAnno+ system.

    Subset: refers to a slice of data created from user-defined searches.

    Record: refers to an item within the data corpus.

    Agent: an Agent is defined by the configuration of the LLM (e.g., model\u2019s name, version, and hyper-parameters) and a prompt template.

    Job: when an Agent is employed to annotate a selected data Subset, the execution is referred to as a Job.

    Label: stores the label assigned to a particular Record

    Label_Metadata: captures additional aspects of a label, such as LLM confidence score or length of label response, etc.

    Verification: captures annotations from human users that confirm or update LLM labels

    "},{"location":"llm_integration/#llm-annotation","title":"LLM Annotation","text":"

    MEGAnno achieves LLM annotation in three steps, as shown in the figure below.

    Figure 3. Steps in the LLM annotation workflow.

    The preprocessing step handles the generation of prompts and validation of model configuration. Users can specify a particular LLM model, define its configurations and customize a prompt template (Figure 4). This defines an Agent which can be used for the annotation task. Registered Agents can be reused later.

    Figure 4. Prompt Template UI. Users can customize task instructions and preview generated prompts.

    After the selected model configuration is validated, the next step is calling the LLM. MEGAnno handles the call to the external LLM API to obtain LLM responses. Any API errors encountered during the call are also appropriately handled and a suitable message is relayed to the user.

    Once the responses are obtained, the post-processing step extracts the label from the LLM response. Our post-processing step ensures some minor deviations in the LLM's response (such as trailing period) are handled. Furthermore, users can set fuzzy_extraction=True which performs a fuzzy match between the LLM response and the label schema space, and if a significant match is found the corresponding label is attributed for the task. The figure below shows how MEGAnno's post-processing mechanism handles several LLM responses.

    Figure 5. Example LLM responses and post-processing results by MEGAnno.

    "},{"location":"llm_integration/#verification-subset-selection","title":"Verification Subset Selection","text":"

    It would be redundant for a human to verify every annotation in the dataset as that would defeat the purpose of using LLMs for a cheap and faster annotation process. Instead, MEGAnno provides a possibility to aid the human verifiers by computing confidence scores for each annotation. Users can specify confidence_score of the LLM labels to be computed and stored. They can then view the confidence scores, and even sort as well as filter over them to obtain only those annotations for which the LLM had low confidence scores. This will ease the human verification process and make it more efficient.

    "},{"location":"llm_integration/#human-verification","title":"Human Verification","text":"

    Users can then use MEGAnno's in-notebook widget to verify LLM labels i.e., either confirm a label as correct or reject the label and specify a correct label. Users may view the final annotations and export the data for downstream tasks or further analysis.

    Figure 6. Verification UI for exploring data and confirming/correcting LLM labels.

    "},{"location":"quickstart/","title":"Getting Started","text":""},{"location":"quickstart/#installation","title":"Installation","text":""},{"location":"quickstart/#self-hosted-service","title":"Self-hosted Service","text":""},{"location":"quickstart/#authentication","title":"Authentication","text":"

    We have 2 ways to authenticate with the service:

    1. Short-term 1 hour access with username and password sign in.

    2. Long-term access with access token without signing in every time.

    "},{"location":"quickstart/#roles","title":"Roles","text":"

    MEGAnno supports 2 types of user roles: Admin and Contributor. Admin users are project owners deploying the services; they have full access to the project such as importing data or updating schemas. Admin users can invite contributors by sharing invitation code(s) with them. Contributors can only access their own annotation namespace and cannot modify the project.

    To invite contributors, follow the instructions below:

    1. Initialize Admin class object:
      from meganno_client import Admin\ntoken = \"...\"\nauth = Authentication(project=\"<project_name>\", token=token)\n\nadmin = Admin(project=\"eacl_demo\", auth=auth)\n# OR\nadmin = Admin(project=\"eacl_demo\", token=token)\n
    2. Genereate invitation code
    3. To renew or revoke an existing invitation code:
    4. New users with valid invitation code can sign up by installing the client library and follow the instructions below:
    "},{"location":"quickstart/#role-access","title":"Role Access","text":"Method Route Role GET POST /agents administrator contributor GET /agents/jobs /agents/<string:agent_uuid>/jobs GET POST /agents/<string:agent_uuid>/jobs/<string:job_uuid> /annotations/<string:record_uuid> administrator contributor job POST /annotations/batch /annotations/<string:record_uuid>/labels administrator contributor /annotations/label_metadata administrator contributor job GET POST /assignments administrator contributor POST /data /data/metadata administrator GET /data/export /data/suggest_similar administrator contributor GET /schemas administrator contributor job POST administrator POST /verifications/<string:record_uuid>/labels administrator contributor GET /annotations /view/record /view/annotation /view/verifications administrator contributor job /reconciliations administrator contributor GET /statistics/annotator/contributions /statistics/annotator/agreements /statistics/embeddings/<embed_type> /statistics/label/progress /statistics/label/distributions administrator GET POST PUT DELETE /invitations administrator GET /invitations/<invitation_code> GET POST DELETE /tokens administrator contributor"},{"location":"references/controller/","title":"Controller","text":""},{"location":"references/controller/#meganno_client.controller.Controller","title":"meganno_client.controller.Controller","text":"

    The Controller class manages annotation agents and runs agent jobs.

    "},{"location":"references/controller/#meganno_client.controller.Controller.__init__","title":"__init__(service, auth)","text":"

    Init function

    Parameters:

    Name Type Description Default service Service

    MEGAnno service object for the connected project.

    required auth Authentication

    MEGAnno authentication object.

    required"},{"location":"references/controller/#meganno_client.controller.Controller.list_agents","title":"list_agents(created_by_filter=None, provider_filter=None, api_filter=None, show_job_list=False)","text":"

    Get the list of registered agents by their issuer IDs.

    Parameters:

    Name Type Description Default created_by_filter list

    List of user IDs to filter agents, by default None (if None, list all)

    None provider_filter

    Returns agents with the specified provider eg. openai

    None api_filter

    Returns agents with the specified api eg. completion

    None show_job_list

    if True, also return the list uuids of jobs of the agent.

    False

    Returns:

    Type Description list

    A list of agents that are created by specified issuers.

    "},{"location":"references/controller/#meganno_client.controller.Controller.list_jobs","title":"list_jobs(filter_by, filter_values, show_agent_details=False)","text":"

    Get the list of jobs with querying filters.

    Parameters:

    Name Type Description Default filter_by str

    Filter options. Must be [\"agent_uuid\" | \"issued_by\" | \"uuid\"] | None

    required filter_values list

    List of uuids of entity specified in 'filter_by'

    required show_agent_details bool

    If True, return agent configuration, by default False

    False

    Returns:

    Type Description list

    A list of jobs that match given filtering criteria.

    "},{"location":"references/controller/#meganno_client.controller.Controller.list_jobs_of_agent","title":"list_jobs_of_agent(agent_uuid, show_agent_details=False)","text":"

    Get the list of jobs of a given agent.

    Parameters:

    Name Type Description Default agent_uuid str

    Agent uuid

    required show_agent_details bool

    If True, return agent configuration, by default False

    False

    Returns:

    Type Description list

    A list of jobs of a given agent

    "},{"location":"references/controller/#meganno_client.controller.Controller.register_agent","title":"register_agent(model_config, prompt_template_str, provider_api)","text":"

    Register an agent with backend service.

    Parameters:

    Name Type Description Default model_config dict

    Model configuration object

    required prompt_template_str str

    Serialized prompt template

    required provider_api str

    Name of provider and corresponding api eg. 'openai:chat'

    required

    Returns:

    Type Description dict

    object with unique agent id.

    "},{"location":"references/controller/#meganno_client.controller.Controller.persist_job","title":"persist_job(agent_uuid, job_uuid, label_name, annotation_uuid_list)","text":"

    Given annoations for a subset, persist them as a job for the project.

    Parameters:

    Name Type Description Default agent_uuid str

    Agent uuid

    required job_uuid str

    Job uuid

    required label_name str

    Label name used for annotation

    required annotation_uuid_list list

    List of uuids of records that have valid annotations from the job

    required

    Returns:

    Type Description dict

    Object with job uuid and annotation count

    "},{"location":"references/controller/#meganno_client.controller.Controller.create_agent","title":"create_agent(model_config, prompt_template, provider_api='openai:chat')","text":"

    Validate model configs and register a new agent. Return new agent's uuid.

    Parameters:

    Name Type Description Default model_config dict

    Model configuration object

    required prompt_template str

    PromptTemplate object

    required provider_api str

    Name of provider and corresponding api eg. 'openai:chat'

    'openai:chat'

    Returns:

    Name Type Description agent_uuid str

    Agent uuid

    "},{"location":"references/controller/#meganno_client.controller.Controller.get_agent_by_uuid","title":"get_agent_by_uuid(agent_uuid)","text":"

    Return agent model configuration, prompt template, and creator id of specified agent.

    Parameters:

    Name Type Description Default agent_uuid str

    Agent uuid

    required

    Returns:

    Type Description dict

    A dict containing agent details.

    "},{"location":"references/controller/#meganno_client.controller.Controller.list_my_agents","title":"list_my_agents()","text":"

    Get the list of registered agents by me.

    Returns:

    Name Type Description agents list

    A list of agents that are created by me.

    "},{"location":"references/controller/#meganno_client.controller.Controller.list_my_jobs","title":"list_my_jobs(show_agent_details=False)","text":"

    Get the list of jobs of issued by me.

    Parameters:

    Name Type Description Default show_agent_details bool

    If True, return agent configuration, by default False

    False

    Returns:

    Name Type Description jobs list

    A list of jobs of issued by me.

    "},{"location":"references/controller/#meganno_client.controller.Controller.run_job","title":"run_job(agent_uuid, subset, label_name, batch_size=1, num_retrials=2, label_meta_names=[], fuzzy_extraction=False)","text":"

    Create, run, and persist an LLM annotation job with given agent and subset.

    Parameters:

    Name Type Description Default agent_uuid str

    Uuid of an agent to be used for the job

    required subset Subset

    [Megagon-only] MEGAnno Subset object to be annotated in the job

    required label_name str

    Label name used for annotation

    required batch_size int

    Size of batch to each Open AI prompt

    1 num_retrials int

    Number of retrials to OpenAI in case of failure in response

    2 label_meta_names

    list of label metadata names to be set

    [] fuzzy_extraction

    Set to True if fuzzy extraction desired in post processing

    False

    Returns:

    Name Type Description job_uuid str

    Job uuid

    "},{"location":"references/openai_job/","title":"OpenAIJob","text":""},{"location":"references/openai_job/#meganno_client.llm_jobs.OpenAIJob","title":"meganno_client.llm_jobs.OpenAIJob","text":"

    The OpenAIJob class handles calls to OpenAI APIs.

    "},{"location":"references/openai_job/#meganno_client.llm_jobs.OpenAIJob.__init__","title":"__init__(label_schema={}, label_names=[], records=[], model_config={}, prompt_template=None)","text":"

    Init function

    Parameters:

    Name Type Description Default label_schema list

    List of label objects

    {} label_names list

    List of label names to be used for annotation

    [] records list

    List of records in [{'data': , 'uuid': }] format

    [] model_config dict

    Parameters for the Open AI model

    {} prompt_template str

    Template based on which prompt to OpenAI is prepared for each record

    None"},{"location":"references/openai_job/#meganno_client.llm_jobs.OpenAIJob.set_openai_api_key","title":"set_openai_api_key(openai_api_key, openai_organization)","text":"

    Set the API keys necessary for call to OpenAI API

    Parameters:

    Name Type Description Default openai_api_key str

    OpenAI API key provided by user

    required openai_organization str[optional]

    OpenAI organization key provided by user

    required"},{"location":"references/openai_job/#meganno_client.llm_jobs.OpenAIJob.validate_openai_api_key","title":"validate_openai_api_key(openai_api_key, openai_organization) staticmethod","text":"

    Validate the OpenAI API and organization keys provided by user

    Parameters:

    Name Type Description Default openai_api_key str

    OpenAI API key provided by user

    required openai_organization str[optional]

    OpenAI organization key provided by user

    required

    Raises:

    Type Description Exception

    If api keys provided by user are invalid, or if any error in calling OpenAI API

    Returns:

    Name Type Description openai_api_key str

    OpenAI API key

    openai_organization str

    OpenAI Organization key

    "},{"location":"references/openai_job/#meganno_client.llm_jobs.OpenAIJob.validate_model_config","title":"validate_model_config(model_config, api_name='chat') staticmethod","text":"

    Validate the LLM model config provided by user. Model should be among the models allowed on MEGAnno, and the parameters should match format specified by Open AI

    Parameters:

    Name Type Description Default model_config dict

    Model specifications such as model name, other parameters eg. temperature, as provided by user

    required api_name str

    Name of OpenAI api eg. \"chat\" or \"completion

    'chat'

    Raises:

    Type Description Exception

    If model is not among the ones provided by MEGAnno, or if configuration format is incorrect

    Returns:

    Name Type Description model_config dict

    Model congigurations

    "},{"location":"references/openai_job/#meganno_client.llm_jobs.OpenAIJob.is_valid_prompt","title":"is_valid_prompt(prompt)","text":"

    Validate the prompt generated. It should not exceed the maximum token limit specified by OpenAI. We use the approximation 1 word ~ 1.33 tokens

    Parameters:

    Name Type Description Default prompt str

    Prompt generated for OpenAI based on template and the record data

    required

    Returns:

    Type Description bool

    True if prompt is valid, False otherwise

    "},{"location":"references/openai_job/#meganno_client.llm_jobs.OpenAIJob.generate_prompts","title":"generate_prompts()","text":"

    Helper function. Given a prompt template and a list of records, generate a list of prompts for each record

    Returns:

    Name Type Description prompts list

    List of tuples of (uuid, generated prompt) for each record in given subset

    "},{"location":"references/openai_job/#meganno_client.llm_jobs.OpenAIJob.get_response_length","title":"get_response_length()","text":"

    Return the length of the openai response

    "},{"location":"references/openai_job/#meganno_client.llm_jobs.OpenAIJob.get_openai_conf_score","title":"get_openai_conf_score()","text":"

    Return confidence score of the label, calculated using average of logit scores

    "},{"location":"references/openai_job/#meganno_client.llm_jobs.OpenAIJob.preprocess","title":"preprocess()","text":"

    Generate the list of prompts for each record based on the subset and template

    Returns:

    Name Type Description prompts list

    List of prompts

    "},{"location":"references/openai_job/#meganno_client.llm_jobs.OpenAIJob.get_llm_annotations","title":"get_llm_annotations(batch_size=1, num_retrials=2, api_name='chat', label_meta_names=[])","text":"

    Call OpenAI using the generated prompts, to obtain valid & invalid responses

    Parameters:

    Name Type Description Default batch_size int

    Size of batch to each Open AI prompt

    1 num_retrials int

    Number of retrials to OpenAI in case of failure in response

    2 api_name str

    Name of OpenAI api eg. \"chat\" or \"completion

    'chat' label_meta_names

    list of label metadata names to be set

    []

    Returns:

    Name Type Description responses list

    List of valid responses from OpenAI

    invalid_responses list

    List of invalid responses from OpenAI

    "},{"location":"references/openai_job/#meganno_client.llm_jobs.OpenAIJob.extract","title":"extract(uuid, response, fuzzy_extraction)","text":"

    Helper function for post-processing. Extract the label (name and value) from the OpenAI response

    Parameters:

    Name Type Description Default uuid str

    Record uuid

    required response str

    Output from OpenAI

    required fuzzy_extraction

    Set to True if fuzzy extraction desired in post processing

    required

    Returns:

    Name Type Description ret dict

    Returns the label name and label value

    "},{"location":"references/openai_job/#meganno_client.llm_jobs.OpenAIJob.post_process_annotations","title":"post_process_annotations(fuzzy_extraction=False)","text":"

    Perform output extraction from the responses generated by LLM, and formats it according to MEGAnno data model.

    Parameters:

    Name Type Description Default fuzzy_extraction

    Set to True if fuzzy extraction desired in post processing

    False

    Returns:

    Name Type Description annotations list

    List of annotations (uuid, label) in format required by MEGAnno

    "},{"location":"references/prompt/","title":"PromptTemplate","text":""},{"location":"references/prompt/#meganno_client.prompt.PromptTemplate","title":"meganno_client.prompt.PromptTemplate","text":"

    The PromptTemplate class represents a prompt template for LLM annotation.

    "},{"location":"references/prompt/#meganno_client.prompt.PromptTemplate.__init__","title":"__init__(label_schema, label_names=[], template='', **kwargs)","text":"

    Init function

    Parameters:

    Name Type Description Default label_schema list

    List of label objects

    required label_names list

    List of label names to be used for annotation, by default []

    [] template str

    Stringified template with input slot, by default ''

    ''"},{"location":"references/prompt/#meganno_client.prompt.PromptTemplate.set_schema","title":"set_schema(label_schema, label_names)","text":"

    A helper function to set schema to be used in prompt template.

    Parameters:

    Name Type Description Default label_schema []

    List of label objects

    required label_names []

    List of label names to be used for annotation, by default all labels

    required"},{"location":"references/prompt/#meganno_client.prompt.PromptTemplate.set_instruction","title":"set_instruction(**kwargs)","text":"

    Update template's task instruction and/or formatting instruction.

    "},{"location":"references/prompt/#meganno_client.prompt.PromptTemplate.build_template","title":"build_template(task_inst, format_inst, f=lambda x: x)","text":"

    A helper function to build template. Return a stringified prompt template with input slot.

    Parameters:

    Name Type Description Default task_inst str

    Task instruction template. Must include '{name}' and '{options}'.

    required format_inst str

    Formatting instruction template. Must include '{format_sample}'.

    required f function

    Use color() to decorate string for print, by default lambda x:x

    lambda x: x

    Returns:

    Name Type Description template str

    Stringified prompt template with input slot

    "},{"location":"references/prompt/#meganno_client.prompt.PromptTemplate.set_template","title":"set_template(**kwargs)","text":"

    Update template by updating task instruction and/or formatting instruction.

    "},{"location":"references/prompt/#meganno_client.prompt.PromptTemplate.get_template","title":"get_template()","text":"

    Return the stringified prompt template with input slot.

    Returns:

    Type Description string

    Stringified prompt template with input slot

    "},{"location":"references/prompt/#meganno_client.prompt.PromptTemplate.get_prompt","title":"get_prompt(input_str: str, **kwargs)","text":"

    Return the prompt for a given input.

    Parameters:

    Name Type Description Default input_str str

    input string to fill input slot

    required

    Returns:

    Name Type Description prompt str

    a prompt template built with given input string

    "},{"location":"references/prompt/#meganno_client.prompt.PromptTemplate.preview","title":"preview(records=[])","text":"

    Open up a widget to modify prompt template and preview final prompt.

    Parameters:

    Name Type Description Default records list

    List of input objects to be used for prompt preview

    []"},{"location":"references/schema/","title":"Schema","text":""},{"location":"references/schema/#meganno_client.schema.Schema","title":"meganno_client.schema.Schema","text":"

    The Schema class defines an annotation schema for a project.

    Attributes:

    Name Type Description __service object

    Service object for the connected project.

    "},{"location":"references/schema/#meganno_client.schema.Schema.set_schemas","title":"set_schemas(schemas=None)","text":"

    Set a user-defined schema

    Parameters:

    Name Type Description Default schemas dict

    Schema of annotation task which defines a label_schema which is a list of Python dictionaries defining the name of the label, the level of the label and options which defines a list of valid label options

    Full Example:

    {\n    \"label_schema\": [\n        {\n            \"name\": \"sentiment\",\n            \"level\": \"record\",\n            \"options\": [\n                {\n                    \"value\": \"pos\",\n                    \"text\": \"positive\"\n                },\n                {\n                    \"value\": \"neg\",\n                    \"text\": \"negative\"\n                }\n            ]\n        },\n\n    ]\n}\n

    None

    Raises:

    Type Description Exception

    If response code is not successful

    Returns:

    Name Type Description response json

    A json of the response

    "},{"location":"references/schema/#meganno_client.schema.Schema.value","title":"value(active=None)","text":"

    Get project schema

    Parameters:

    Name Type Description Default active bool

    If True, only retrieve the active(latest) schema; if False, retrieve all previous schema; if None, retrieve full history.

    None"},{"location":"references/schema/#meganno_client.schema.Schema.get_active_schemas","title":"get_active_schemas()","text":"

    Get the active schema for the project.

    "},{"location":"references/schema/#meganno_client.schema.Schema.get_history","title":"get_history()","text":"

    Get the full history of project schemas

    "},{"location":"references/service/","title":"Service","text":""},{"location":"references/service/#meganno_client.service.Service","title":"meganno_client.service.Service","text":"

    Service objects communicate to back-end MEGAnno services and establish connections to a MEGAnno project.

    "},{"location":"references/service/#meganno_client.service.Service.__init__","title":"__init__(host=None, project=None, token=None, auth=None, port=5000)","text":"

    Init function

    Parameters:

    Name Type Description Default host str

    Host IP address for the back-end service to connect to. If None, connects to a Megagon-hosted service.

    None project str

    Project name. The name needs to be unique within the host domain.

    None token str

    User's authentication token.

    None auth Authentication

    Authentication object. Can be skipped if a valid token is provided.

    None"},{"location":"references/service/#meganno_client.service.Service.show","title":"show(config={})","text":"

    Show project management dashboard in a floating dashboard.

    "},{"location":"references/service/#meganno_client.service.Service.get_service_endpoint","title":"get_service_endpoint(key=None)","text":"

    Get REST endpoint for the connected project. Endpoints are composed from base project url and routes for specific requests.

    Parameters:

    Name Type Description Default key str

    Name of the specific request. Mapping to routes is stored in a dictionary SERVICE_ENDPOINTS in constants.py.

    None"},{"location":"references/service/#meganno_client.service.Service.get_base_payload","title":"get_base_payload()","text":"

    Get the base payload for any REST request which includes the authentication token.

    "},{"location":"references/service/#meganno_client.service.Service.get_schemas","title":"get_schemas()","text":"

    Get schema object for the connected project.

    "},{"location":"references/service/#meganno_client.service.Service.get_statistics","title":"get_statistics()","text":"

    Get the statistics object for the project which supports calculations in the management dashboard.

    "},{"location":"references/service/#meganno_client.service.Service.get_users_by_uids","title":"get_users_by_uids(uids: list = [])","text":"

    Get user names by their unique IDs.

    Parameters:

    Name Type Description Default uids list

    list of unique user IDs.

    []"},{"location":"references/service/#meganno_client.service.Service.get_annotator","title":"get_annotator()","text":"

    Get annotator's own name and user ID. The back-end service distinguishes annotator by the token or auth object used to initialize the connection.

    "},{"location":"references/service/#meganno_client.service.Service.search","title":"search(limit=DEFAULT_LIST_LIMIT, skip=0, uuid_list=None, keyword=None, regex=None, record_metadata_condition=None, annotator_list=None, label_condition=None, label_metadata_condition=None, verification_condition=None)","text":"

    Search the back-end database based on user-provided predicates.

    Parameters:

    Name Type Description Default limit

    The limit of returned records in the subest.

    DEFAULT_LIST_LIMIT skip

    skip index of returned subset (excluding the first skip rows from the raw results ordered by importing order).

    0 uuid_list

    list of record uuids to filter on

    None keyword

    Term for exact keyword searches.

    None regex

    Term for regular expression searches.

    None record_metadata_condition

    {\"name\": # name of the record-level metadata to filter on \"opeartor\": \"==\"|\"<\"|\">\"|\"<=\"|\">=\"|\"exists\", \"value\": # value to complete the expression}

    None annotator_list

    list of annotator names to filter on

    None label_condition

    Label condition of the annotation. {\"name\": # name of the label to filter on \"opeartor\": \"==\"|\"<\"|\">\"|\"<=\"|\">=\"|\"exists\"|\"conflicts\", \"value\": # value to complete the expression}

    None label_metadata_condition

    Label metadata condition of the annotation. Note this can be on different labels than label_condition {\"label_name\": # name of the associated label \"name\": # name of the label-level metadata to filter on \"operator\": \"==\"|\"<\"|\">\"|\"<=\"|\">=\"|\"exists\", \"value\": # value to complete the expression}

    None verification_condition

    verification condition of the annotation. {\"label_name\": # name of the associated label \"search_mode\":\"ALL\"|\"UNVERIFIED\"|\"VERIFIED\"}

    None

    Returns:

    Name Type Description subset Subset

    Subset meeting the search conditions.

    "},{"location":"references/service/#meganno_client.service.Service.deprecate_submit_annotations","title":"deprecate_submit_annotations(subset=None, uuid_list=[])","text":"

    Submit annotations for records in a subset to the back-end service database. Results are filtered to only include annotations owned by the authenticated annotator.

    Parameters:

    Name Type Description Default subset Subset

    The subset object containing records and annotations.

    None uuid_list list

    Additional filter. Only subset records whose uuid are in this list will be submitted.

    []"},{"location":"references/service/#meganno_client.service.Service.submit_annotations","title":"submit_annotations(subset=None, uuid_list=[])","text":"

    Submit annotations for a batch of records in a subset to the back-end service database. Results are filtered to only include annotations owned by the authenticated annotator.

    Parameters:

    Name Type Description Default subset Subset

    The subset object containing records and annotations.

    None uuid_list list

    Additional filter. Only subset records whose uuid are in this list will be submitted.

    []"},{"location":"references/service/#meganno_client.service.Service.import_data_url","title":"import_data_url(url='', file_type=None, column_mapping={})","text":"

    Import data from a public url, currently only supporting csv files. Each row corresponds to a data record. The file needs at least two columns: one with a unique id for each row, and one with the raw data content.

    Parameters:

    Name Type Description Default url str

    Public url for csv file

    '' file_type str

    Currently only supporting type 'CSV'

    None column_mapping dict

    Dictionary with fields id specifying id column name, and content specifying content column name. For example, with a csv file with two columns index and tweet:

    {\n    \"id\": \"index\",\n    \"content\": \"tweet\"\n}\n

    {}"},{"location":"references/service/#meganno_client.service.Service.import_data_df","title":"import_data_df(df, column_mapping={})","text":"

    Import data from a pandas DataFrame. Each row corresponds to a data record. The dataframe needs at least two columns: one with a unique id for each row, and one with the raw data content.

    Parameters:

    Name Type Description Default df DataFrame

    Qualifying dataframe

    required column_mapping dict

    Dictionary with fields id specifying id column name, and content specifying content column name. Using a dataframe, users can import metadata at the same time. For example, with a csv file with two columns index and tweet, and a column location:

    {\n    \"id\": \"index\",\n    \"content\": \"tweet\",\n    \"metadata\": \"location\"\n}\n
    metadata with name location will be created for all imported data records.

    {}"},{"location":"references/service/#meganno_client.service.Service.export","title":"export()","text":"

    Exporting function.

    Returns:

    Name Type Description export_df DataFrame

    A pandas dataframe with columns 'data_id', 'content', 'annotator', 'label_name', 'label_value' for all records in the project

    "},{"location":"references/service/#meganno_client.service.Service.set_metadata","title":"set_metadata(meta_name, func, batch_size=500)","text":"

    Set metadata for all records in the back-end database, based on user-defined function for metadata calculation.

    Parameters:

    Name Type Description Default meta_name str

    Name of the metadata. Will be used to identify and query the metadata.

    required func function(raw_content)

    Function which takes input the raw data content and returns the corresponding metadata (int, string, vectors...).

    required batch_size int

    Batch size for back-end database updates.

    500 Example
    from sentence_transformers import SentenceTransformer\n\nmodel = SentenceTransformer('all-MiniLM-L6-v2')\n# set metadata generation function for service object demo\ndemo.set_metadata(\"bert-embedding\",\n                  lambda x: list(model.encode(x).astype(float)), 500)\n
    "},{"location":"references/service/#meganno_client.service.Service.get_assignment","title":"get_assignment(annotator=None, latest_only=False)","text":"

    Get workload assignment for annotator.

    Parameters:

    Name Type Description Default annotator str

    User ID to query. If set to None, use ID of auth token holder.

    None latest_only bool

    If true, return only the last assignment for the user. Else, return the set of all assigned records.

    False"},{"location":"references/statistic/","title":"Statistic","text":""},{"location":"references/statistic/#meganno_client.statistic.Statistic","title":"meganno_client.statistic.Statistic","text":"

    The Statistic class contains methods to show basic statistics of the labeling project. Mostly used to back views in the monitoring dashboard.

    Attributes:

    Name Type Description __service Service

    Service object for the connected project.

    "},{"location":"references/statistic/#meganno_client.statistic.Statistic.get_label_progress","title":"get_label_progress()","text":"

    Get the overall progress of annotation.

    Returns:

    Name Type Description response dict

    A dictionary with fields total showing total number for data records, and annotated showing number of records with any label from at least one annotator.

    "},{"location":"references/statistic/#meganno_client.statistic.Statistic.get_label_distributions","title":"get_label_distributions(label_name: str = None)","text":"

    Get the class distribution of a selected label. If multiple annotators labeled the same record, aggregate using majority vote.

    Parameters:

    Name Type Description Default label_name str

    Name of label as specified in the schema.

    None

    Returns:

    Name Type Description response dict

    A dictionary showing aggregated class frequencies. Example: {'neg': 60, 'neu': 14, 'pos': 27, 'tied_annotations': 3}. tied_annotation counts numbers of record when there's more than majority voted classes.

    "},{"location":"references/statistic/#meganno_client.statistic.Statistic.get_annotator_contributions","title":"get_annotator_contributions()","text":"

    Get contributions of annotators in terms of records labeled.

    Returns:

    Name Type Description response dict

    A dictionary where keys are annotator IDs and values are total numbers of annotated records by each annotator.

    "},{"location":"references/statistic/#meganno_client.statistic.Statistic.get_annotator_agreements","title":"get_annotator_agreements(label_name: str = None)","text":"

    Get pairwise agreement score between all contributing annotators to the project, on the specified label. The default agreement calculation method is cohen_kappa.

    Parameters:

    Name Type Description Default label_name str

    Name of label as specified in the schema.

    None

    Returns:

    Name Type Description response dict

    A dictionary where keys are pairs of annotator IDs, and values are their agreement scores. The higher the scores are, the more frequent the pairs of annotators agree.

    "},{"location":"references/statistic/#meganno_client.statistic.Statistic.get_embeddings","title":"get_embeddings(label_name: str = None, embed_type: str = None)","text":"

    Return 2-dimensional TSNE projection of the text embedding for data records, together with their aggregated labels (using majority votes). Used for projection view in the monitoring dashboard.

    Parameters:

    Name Type Description Default label_name str

    Name of label as specified in the schema.

    None embed_type str

    the meta_name for the specified embedding

    None

    Returns:

    Name Type Description response dict

    A dictionary with fields agg_label showing aggregated class label, x_axis and y_axis showing projected 2d coordinates.

    "},{"location":"references/subset/","title":"Subset","text":""},{"location":"references/subset/#meganno_client.subset.Subset","title":"meganno_client.subset.Subset","text":"

    The Subset class is used to represent a group of data records

    Attributes:

    Name Type Description __data_uuids list

    List of unique identifiers of data records in the subset.

    __service Service

    Connected backend service

    __my_annotation_list list

    Local cache of the record and annotation view of the subset owned by service.annotator_id. with all possible metadata.

    "},{"location":"references/subset/#meganno_client.subset.Subset.__init__","title":"__init__(service, data_uuids=[], job_id=None)","text":"

    Init function

    Parameters:

    Name Type Description Default service Service

    Service-class object identifying the connected backend service and corresponding data storage

    required data_uuids list

    List of data uuid's to be included in the subset

    []"},{"location":"references/subset/#meganno_client.subset.Subset.get_uuid_list","title":"get_uuid_list()","text":"

    Get list of unique identifiers for all records in the subset.

    Returns:

    Name Type Description __data_uuids list

    List of data uuids included in Subset

    "},{"location":"references/subset/#meganno_client.subset.Subset.value","title":"value(annotator_list: list = None)","text":"

    Check for cached data and annotations of service owner, or retrieve for other annotators (not cached).

    Parameters:

    Name Type Description Default annotator_list list

    if None, retrieve cached own annotator. else, fetch live annotation from others.

    None

    Returns:

    Name Type Description subset_annotation_list list

    See __get_annotation_list for description and example.

    "},{"location":"references/subset/#meganno_client.subset.Subset.get_annotation_by_uuid","title":"get_annotation_by_uuid(uuid)","text":"

    Return the annotation for a particular data record (specified by uuid)

    Parameters:

    Name Type Description Default uuid str

    the uuid for the data record specified by user

    required

    Returns:

    Name Type Description annotation dict

    Annotation for specified data record if it exists else None

    "},{"location":"references/subset/#meganno_client.subset.Subset.show","title":"show(config={})","text":"

    Visualize the current subset in an in-notebook annotation widget.

    Development note: initializing an Annotation widget, creating unique reference to the associated subset and service.

    Parameters:

    Name Type Description Default config dict

    Configuration for default view of the widget.

    - view : \"single\" | \"table\", default \"single\"\n- mode : \"annotating\" | \"reconciling\", default \"annotating\"\n- title: default \"Annotation\"\n- height: default 300 (pixels)\n
    {}"},{"location":"references/subset/#meganno_client.subset.Subset.set_annotations","title":"set_annotations(uuid=None, labels=None)","text":"

    Set the annotation for a particular data record with the specified label

    Parameters:

    Name Type Description Default uuid str

    the uuid for the data record specified by user

    None labels dict

    The labels for the data record at record and span level, with the following structure:

    - \"labels_record\" : list\n    A list of record-level labels\n- \"labels_span\" : list\n    A list of span-level labels\n\nExamples\n-------\n\nExample of setting an annotation with the desired record and span level labels:\n```json\n{\n    \"labels_record\": [\n        {\n            \"label_name\": \"sentiment\",\n            \"label_value\": [\"neu\"]\n        }\n    ],\n\n    \"labels_span\": [\n        {\n            \"label_name\": \"sentiment\",\n            \"label_value\": [\"neu\"],\n            \"start_idx\": 10,\n            \"end_idx\": 20\n        }\n    ]\n}\n```\n
    None

    Raises:

    Type Description Exception

    If uuid or labels is None

    Returns:

    Name Type Description labels dict

    Updated labels for uuid annotated by user

    "},{"location":"references/subset/#meganno_client.subset.Subset.get_reconciliation_data","title":"get_reconciliation_data(uuid_list=None)","text":"

    Return the list of reconciliation data for all data entries specified by user. The reconciliation data for one data record consists of the annotations for it by all annotators

    Parameters:

    Name Type Description Default uuid_list list

    list of uuid's provided by user. If None, use all records in the subset

    None

    Returns:

    Name Type Description reconciliation_data_list list

    List of reconciliation data for each uuid with the following keys: annotation_list which specifies all the annotations for the uuid, data which contains the raw data specified by the uuid, metadata which stores additional information about the data, tokens , and the uuid of the data record Full Example:

    {\n    \"annotation_list\": [\n        {\n            \"annotator\": \"pwOA1N9RKZVJM8VZZ7w8VcT8lp22\",\n            \"labels_record\": [],\n            \"labels_span\": []\n        },\n        {\n            \"annotator\": \"IAzgHOxyeLQBi5QVo7dQR0p2DpA2\",\n            \"labels_record\": [\n                {\n                    \"label_name\": \"sentiment\",\n                    \"label_value\": [\"pos\"]\n                }\n            ],\n            \"labels_span\": []\n        }\n    ],\n    \"data\": \"@united obviously\",\n    \"metadata\": [],\n    \"tokens\": [],\n    \"uuid\": \"ee408271-df5d-435c-af25-72df58a21bfe\"\n}\n
    "},{"location":"references/subset/#meganno_client.subset.Subset.suggest_similar","title":"suggest_similar(record_meta_name, limit=3)","text":"

    For each data record in the subset, suggest more similar data records by retriving the most similar data records from the pool, based on metadata(e.g., embedding) distance.

    Parameters:

    Name Type Description Default record_meta_name str

    The meta-name eg. \"bert-embedding\" for which the similarity is calculated upon.

    required limit int

    The number of matching/similar records desired to be returned. Default is 3

    3

    Raises:

    Type Description Exception

    If response code is not successful

    Returns:

    Name Type Description subset Subset

    A subset of similar data entries

    "},{"location":"references/subset/#meganno_client.subset.Subset.assign","title":"assign(annotator)","text":"

    Assign the current subset as payload to an annotator.

    Parameters:

    Name Type Description Default annotator str

    Annotator ID.

    required"}]} \ No newline at end of file +{"config":{"lang":["en"],"separator":"[\\s\\-]+","pipeline":["stopWordFilter"]},"docs":[{"location":"","title":"Welcome to MEGAnno documentation","text":""},{"location":"#how-to-get-started","title":"How to get started?","text":"

    There are 2 ways to get started with MEGAnno:

    1. Demo system access: We prepared a Google Colab notebook for this demo. To run the Colab notebook, you\u2019ll need a Google account, an OpenAI API key, and a MEGAnno access token (you can get this by filling out the request form).

    2. Your own MEGAnno environment: To set up MEGAnno for your own projects, you can set up your own self-hosted MEGAnno service. Please follow the self-hosted installation instructions.

    "},{"location":"#what-is-meganno","title":"What is MEGAnno?","text":"

    Many existing data annotation tools focus on the annotator enabling them to annotate data and manage annotation activities. Instead, MEGAnno is an open-source data annotation tool that puts the data scientist first, enabling you to bootstrap annotation tasks and manage the continual evolution of annotations through the machine learning lifecycle.

    In addition, MEGAnno\u2019s unique capabilities include:

    Figure 1. MEGAnno's unique capabilities

    "},{"location":"#system-overview","title":"System Overview","text":"

    MEGAnno provides two key components: (1) a Python client library featuring interactive widgets and (2) a back-end service consisting of web API and database servers. To use our system, a user can interact with a Jupyter Notebook that has the MEGAnno client installed. Through programmatic interfaces and UI widgets, the client communicates with the service. Figure 2. Overview of MEGAnno+ system.

    Please see the Getting Started page for setup instructions and the Advanced Features page for more cool features we provide.

    "},{"location":"#references","title":"References","text":"

    @inproceedings{kim-etal-2024-meganno,\n    title = \"{MEGA}nno+: A Human-{LLM} Collaborative Annotation System\",\n    author = \"Kim, Hannah and Mitra, Kushan and Li Chen, Rafael and Rahman, Sajjadur and Zhang, Dan\",\n    editor = \"Aletras, Nikolaos and De Clercq, Orphee\",\n    booktitle = \"Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations\",\n    month = mar,\n    year = \"2024\",\n    address = \"St. Julians, Malta\",\n    publisher = \"Association for Computational Linguistics\",\n    url = \"https://aclanthology.org/2024.eacl-demo.18\",\n    pages = \"168--176\",\n}\n
    @inproceedings{zhang-etal-2022-meganno,\n    title = \"{MEGA}nno: Exploratory Labeling for {NLP} in Computational Notebooks\",\n    author = \"Zhang, Dan and Kim, Hannah and Li Chen, Rafael and Kandogan, Eser and Hruschka, Estevam\",\n    editor = \"Dragut, Eduard and Li, Yunyao and Popa, Lucian and Vucetic, Slobodan and Srivastava, Shashank\",\n    booktitle = \"Proceedings of the Fourth Workshop on Data Science with Human-in-the-Loop (Language Advances)\",\n    month = dec,\n    year = \"2022\",\n    address = \"Abu Dhabi, United Arab Emirates (Hybrid)\",\n    publisher = \"Association for Computational Linguistics\",\n    url = \"https://aclanthology.org/2022.dash-1.1\",\n    pages = \"1--7\",\n}\n

    "},{"location":"advanced/","title":"Advanced features","text":"

    This notebook provides examples of some of the advanced features.

    "},{"location":"advanced/#updating-schema","title":"Updating Schema","text":"

    Annotation requirements can change as projects evolve. To update the schema for a project, simply call set_schemas with the new schema object. For example, to expand the schema we set in the basic notebook:

    demo.get_schemas().set_schemas({\n    \"label_schema\": [\n        {\n            \"name\": \"sentiment\",\n            \"level\": \"record\", \n            \"options\": [\n                { \"value\": \"pos\", \"text\": \"positive\" },\n                { \"value\": \"neg\", \"text\": \"negative\" },\n                { \"value\": \"neu\", \"text\": \"neutral\" } # adding a new option\n            ]\n        },\n        # adding a span-level label\n                {\n            \"name\": \"sp\",\n            \"level\": \"span\", \n            \"options\": [\n                { \"value\": \"pos\", \"text\": \"positive\" },\n                { \"value\": \"neg\", \"text\": \"negative\" },\n            ]\n        }\n    ]\n})\n
    Only the latest schema will be active, but all previous ones will be preserved. To see the full history:
    demo.get_schemas().get_history()\n

    "},{"location":"advanced/#metadata","title":"Metadata","text":"

    In MEGAnno, metadata refers to auxiliary information associated with data records. MEGAnno takes user-defined functions to generate metadata and uses it to find important subsets and assist human annotators. Here we show two examples.

    Example 1: Adding sentence bert embeddings for data records. The embeddings can later be used to make similarity computations over records.

    # Example 1, adding sentence-bert embedding.\nfrom sentence_transformers import SentenceTransformer\nmodel = SentenceTransformer(\"all-MiniLM-L6-v2\")\n# set metadata generation function \ndemo.set_metadata(\"bert-embedding\",lambda x: list(model.encode(x).astype(float)), 500)\n

    Example 2: Extracting hashtags as annotation context.

    # user defined function to extract hashtag\ndef extract_hashtags(text):\n    hashtag_list = []\n    for word in text.split():\n        if word[0] == \"#\":\n            hashtag_list.append(word[:])\n    # widget can render markdown text\n    return \"\".join([\"- {}\\n\".format(x) for x in hashtag_list])\n\n# apply metadata to the project\ndemo.set_metadata(\"hashtag\", lambda x: extract_hashtags(x), 500)\n

    With hashtag metadata, MEGAnno widget can show it as context at annotation time.

    s1= demo.search(keyword=\"\", limit=50, skip=0, meta_names=[\"hashtag\"])\ns1.show()\n

    "},{"location":"advanced/#advanced-subset-generation","title":"Advanced Subset Generation","text":"

    In addition to exact keyword matches, MEGAnno also provides more advanced approaches of generating subsets.

    "},{"location":"advanced/#regex-based-searches","title":"Regex-based Searches","text":"

    MEGAnno supports searches based on regular expressions:

    s2_reg= demo.search(regex=\".* (delay) .*\", limit=50, skip=0)\ns2_reg.show({\"view\": \"table\"})\n

    "},{"location":"advanced/#subset-suggestion","title":"Subset Suggestion","text":"

    Searches initiated by users can help them explore the dataset in a controlled way. Still, the quality of searches is only as good as users\u2019 knowledge about the data and domain. MEGAnno provides an automated subset suggestion engine to assist with exploration. Embedding-based suggestions make suggestions based on data-embedding vectors provided by the user (as metadata).

    For example, suggest_similar suggests neighbors (based on distance in the embedding space) of data in the querying subset:

    s3 = demo.search(keyword=\"delay\", limit=3, skip=0) # source subset\ns4 = s3.suggest_similar(\"bert-embedding\", limit=4) # needs to provide a valid meta_name\ns4.show()\n
    "},{"location":"advanced/#subset-operations","title":"Subset Operations","text":"

    MEGAnno supports set operations to build more subsets from others:

    # intersection\ns_intersection = s1 & s2 # or s1.intersection(s2)\n# union\ns_union = s1 | s2 # or s1.union(s2)\n# difference\ns_diff = s1 - s2 # or s1.difference(s2)\n

    "},{"location":"advanced/#dashboard-administrator-only","title":"Dashboard (administrator-only)","text":"

    MEGAnno provides a built-in visual monitoring dashboard to help users to get real-time status of the annotation project. As projects evolve, users would often need to understand the project\u2019s status to make decisions about the next steps, like collecting more data points with certain characteristics or adding a new class to the task definition. To aid such analysis, the dashboard widget packs common statistics and analytical visualizations (e.g., annotation progress, distribution of labels, annotator agreement, etc.) based on a survey of our pilot users.

    To bring up the project dashboard:

    demo.show()\n

    Other features

    "},{"location":"basic/","title":"Basic Usages","text":"

    Please also refer to this notebook for a running example of the basic pipeline of using MEGAnno in a notebook.

    "},{"location":"basic/#setting-schema","title":"Setting Schema","text":"

    Schema defines the annotation task. Example of setting schema for a sentiment analysis task with positive and negative options.

    demo.get_schemas().set_schemas({\n    \"label_schema\": [\n        {\n            \"name\": \"sentiment\",\n            \"level\": \"record\", \n            \"options\": [\n                { \"value\": \"pos\", \"text\": \"positive\" },\n                { \"value\": \"neg\", \"text\": \"negative\" },\n            ]\n        }\n    ]\n})\ndemo.get_schemas().value(active=True)       \n
    A label can be defined to have level record or span. Record-level labels correspond to the entire data record, while span-level labels are associated with a text span in the record. See Updating Schema for an example of a more complex schema.

    "},{"location":"basic/#importing-data","title":"Importing Data","text":"

    Given a pandas dataframe like this (example generated from this Twitter US Airline Sentiment dataset):

    id tweet 0 @united how else would I know it was denied? 1 @JetBlue my SIL bought tix for us to NYC. We were told at the gate that her cc was declined. Supervisor accused us of illegal activity. 2 @JetBlue dispatcher keeps yelling and hung up on me!

    Importing data is easy by providing column names for id which is a unique importing identifier for data records, and content which is the raw text field.

    demo.import_data_df(df, column_mapping={\n    \"id\": \"id\",\n    \"content\": \"tweet\"\n})\n
    "},{"location":"basic/#exploratory-labeling","title":"Exploratory Labeling","text":"

    Not all data points are equally important for downstream models and applications. There are often cases where users might want to prioritize a particular batch (e.g., to achieve better class or domain coverage or focus on the data points that the downstream model cannot predict well). MEGAnno provides a flexible and controllable way of organizing annotation projects through the exploratory labeling. This annotation process is done by first identifying an interesting subset and assigning labels to data in the subset. We provide a set of \u201cpower tools\u201d to help identify valuable subsets.

    The script below shows an example of searching for data records with keyword \"delay\" and bringing up a widget for annotation in the next cell. More examples here.

    # search results => subset s1\ns1 = demo.search(keyword=\"delay\", limit=10, skip=0)\n# bring up a widget \ns1.show()\n

    "},{"location":"basic/#column-filters","title":"Column Filters","text":"

    To view all column filters, click on \"Filters\" button; to reset all column filters, click on \"Reset filters\" button.

    "},{"location":"basic/#column-order-visibility","title":"Column Order & Visibility","text":"

    1. To re-order and re-size column, mouse over column drag handler (left grip handler for re-order and right column edge for re-size). 2. To toggle column visiblity, click on \"Columns\", then toggle column to show/hide. 3. To reset column ordering and visibility, click on \"Reset columns\" button.

    "},{"location":"basic/#metadata-focus-view","title":"Metadata Focus-view","text":"

    To focus on a single metadata value, click on \"Settings\" button, then choose a metadata name from the list.

    "},{"location":"basic/#exporting","title":"Exporting","text":"

    Although iterations can happen within a single notebook, it's easy to export the data, and annotations collected:

    # collecting the annotation generated by all annotators\ndemo.export()\n
    "},{"location":"llm_integration/","title":"LLM Integration","text":"

    This notebook provides an example workflow of utilizing LLMs as annotation agents within MEGAnno.

    Figure 1. Human-LLM collaborative workflow.

    MEGAnno offers a simple human-LLM collaborative annotation workflow: LLM annotation followed by human verification. Put simply, LLM agents label data first (Figure 1, step \u2460), and humans verify LLM labels as needed. For most tasks and datasets one can use LLM labels as is; for some subset of difficult or uncertain instances (Figure 1, step \u2461), humans can verify LLM labels \u2013 confirm the right ones and correct the wrong ones (Figure 1, step \u2462). In this way, the LLM annotation part can be automated, and human efforts can be directed to where they are most needed to improve the quality of final labels.

    An overview of the entire system and key concepts are shown below.

    Figure 2. Overview of MEGAnno+ system.

    Subset: refers to a slice of data created from user-defined searches.

    Record: refers to an item within the data corpus.

    Agent: an Agent is defined by the configuration of the LLM (e.g., model\u2019s name, version, and hyper-parameters) and a prompt template.

    Job: when an Agent is employed to annotate a selected data Subset, the execution is referred to as a Job.

    Label: stores the label assigned to a particular Record

    Label_Metadata: captures additional aspects of a label, such as LLM confidence score or length of label response, etc.

    Verification: captures annotations from human users that confirm or update LLM labels

    "},{"location":"llm_integration/#llm-annotation","title":"LLM Annotation","text":"

    MEGAnno achieves LLM annotation in three steps, as shown in the figure below.

    Figure 3. Steps in the LLM annotation workflow.

    The preprocessing step handles the generation of prompts and validation of model configuration. Users can specify a particular LLM model, define its configurations and customize a prompt template (Figure 4). This defines an Agent which can be used for the annotation task. Registered Agents can be reused later.

    Figure 4. Prompt Template UI. Users can customize task instructions and preview generated prompts.

    After the selected model configuration is validated, the next step is calling the LLM. MEGAnno handles the call to the external LLM API to obtain LLM responses. Any API errors encountered during the call are also appropriately handled and a suitable message is relayed to the user.

    Once the responses are obtained, the post-processing step extracts the label from the LLM response. Our post-processing step ensures some minor deviations in the LLM's response (such as trailing period) are handled. Furthermore, users can set fuzzy_extraction=True which performs a fuzzy match between the LLM response and the label schema space, and if a significant match is found the corresponding label is attributed for the task. The figure below shows how MEGAnno's post-processing mechanism handles several LLM responses.

    Figure 5. Example LLM responses and post-processing results by MEGAnno.

    "},{"location":"llm_integration/#verification-subset-selection","title":"Verification Subset Selection","text":"

    It would be redundant for a human to verify every annotation in the dataset as that would defeat the purpose of using LLMs for a cheap and faster annotation process. Instead, MEGAnno provides a possibility to aid the human verifiers by computing confidence scores for each annotation. Users can specify confidence_score of the LLM labels to be computed and stored. They can then view the confidence scores, and even sort as well as filter over them to obtain only those annotations for which the LLM had low confidence scores. This will ease the human verification process and make it more efficient.

    "},{"location":"llm_integration/#human-verification","title":"Human Verification","text":"

    Users can then use MEGAnno's in-notebook widget to verify LLM labels i.e., either confirm a label as correct or reject the label and specify a correct label. Users may view the final annotations and export the data for downstream tasks or further analysis.

    Figure 6. Verification UI for exploring data and confirming/correcting LLM labels.

    "},{"location":"quickstart/","title":"Getting Started","text":""},{"location":"quickstart/#installation","title":"Installation","text":""},{"location":"quickstart/#self-hosted-service","title":"Self-hosted Service","text":""},{"location":"quickstart/#authentication","title":"Authentication","text":"

    We have 2 ways to authenticate with the service:

    1. Short-term 1 hour access with username and password sign in.

    2. Long-term access with access token without signing in every time.

    "},{"location":"quickstart/#roles","title":"Roles","text":"

    MEGAnno supports 2 types of user roles: Admin and Contributor. Admin users are project owners deploying the services; they have full access to the project such as importing data or updating schemas. Admin users can invite contributors by sharing invitation code(s) with them. Contributors can only access their own annotation namespace and cannot modify the project.

    To invite contributors, follow the instructions below:

    1. Initialize Admin class object:
      from meganno_client import Admin\ntoken = \"...\"\nauth = Authentication(project=\"<project_name>\", token=token)\n\nadmin = Admin(project=\"eacl_demo\", auth=auth)\n# OR\nadmin = Admin(project=\"eacl_demo\", token=token)\n
    2. Genereate invitation code
    3. To renew or revoke an existing invitation code:
    4. New users with valid invitation code can sign up by installing the client library and follow the instructions below:
    "},{"location":"quickstart/#role-access","title":"Role Access","text":"Method Route Role GET POST /agents administrator contributor GET /agents/jobs /agents/<string:agent_uuid>/jobs GET POST /agents/<string:agent_uuid>/jobs/<string:job_uuid> /annotations/<string:record_uuid> administrator contributor job POST /annotations/batch /annotations/<string:record_uuid>/labels administrator contributor /annotations/label_metadata administrator contributor job GET POST /assignments administrator contributor POST /data /data/metadata administrator GET /data/export /data/suggest_similar administrator contributor GET /schemas administrator contributor job POST administrator POST /verifications/<string:record_uuid>/labels administrator contributor GET /annotations /view/record /view/annotation /view/verifications administrator contributor job /reconciliations administrator contributor GET /statistics/annotator/contributions /statistics/annotator/agreements /statistics/embeddings/<embed_type> /statistics/label/progress /statistics/label/distributions administrator GET POST PUT DELETE /invitations administrator GET /invitations/<invitation_code> GET POST DELETE /tokens administrator contributor"},{"location":"references/controller/","title":"Controller","text":""},{"location":"references/controller/#meganno_client.controller.Controller","title":"meganno_client.controller.Controller","text":"

    The Controller class manages annotation agents and runs agent jobs.

    "},{"location":"references/controller/#meganno_client.controller.Controller.__init__","title":"__init__(service, auth)","text":"

    Init function

    Parameters:

    Name Type Description Default service Service

    MEGAnno service object for the connected project.

    required auth Authentication

    MEGAnno authentication object.

    required"},{"location":"references/controller/#meganno_client.controller.Controller.list_agents","title":"list_agents(created_by_filter=None, provider_filter=None, api_filter=None, show_job_list=False)","text":"

    Get the list of registered agents by their issuer IDs.

    Parameters:

    Name Type Description Default created_by_filter list

    List of user IDs to filter agents, by default None (if None, list all)

    None provider_filter

    Returns agents with the specified provider eg. openai

    None api_filter

    Returns agents with the specified api eg. completion

    None show_job_list

    if True, also return the list uuids of jobs of the agent.

    False

    Returns:

    Type Description list

    A list of agents that are created by specified issuers.

    "},{"location":"references/controller/#meganno_client.controller.Controller.list_jobs","title":"list_jobs(filter_by, filter_values, show_agent_details=False)","text":"

    Get the list of jobs with querying filters.

    Parameters:

    Name Type Description Default filter_by str

    Filter options. Must be [\"agent_uuid\" | \"issued_by\" | \"uuid\"] | None

    required filter_values list

    List of uuids of entity specified in 'filter_by'

    required show_agent_details bool

    If True, return agent configuration, by default False

    False

    Returns:

    Type Description list

    A list of jobs that match given filtering criteria.

    "},{"location":"references/controller/#meganno_client.controller.Controller.list_jobs_of_agent","title":"list_jobs_of_agent(agent_uuid, show_agent_details=False)","text":"

    Get the list of jobs of a given agent.

    Parameters:

    Name Type Description Default agent_uuid str

    Agent uuid

    required show_agent_details bool

    If True, return agent configuration, by default False

    False

    Returns:

    Type Description list

    A list of jobs of a given agent

    "},{"location":"references/controller/#meganno_client.controller.Controller.register_agent","title":"register_agent(model_config, prompt_template_str, provider_api)","text":"

    Register an agent with backend service.

    Parameters:

    Name Type Description Default model_config dict

    Model configuration object

    required prompt_template_str str

    Serialized prompt template

    required provider_api str

    Name of provider and corresponding api eg. 'openai:chat'

    required

    Returns:

    Type Description dict

    object with unique agent id.

    "},{"location":"references/controller/#meganno_client.controller.Controller.persist_job","title":"persist_job(agent_uuid, job_uuid, label_name, annotation_uuid_list)","text":"

    Given annoations for a subset, persist them as a job for the project.

    Parameters:

    Name Type Description Default agent_uuid str

    Agent uuid

    required job_uuid str

    Job uuid

    required label_name str

    Label name used for annotation

    required annotation_uuid_list list

    List of uuids of records that have valid annotations from the job

    required

    Returns:

    Type Description dict

    Object with job uuid and annotation count

    "},{"location":"references/controller/#meganno_client.controller.Controller.create_agent","title":"create_agent(model_config, prompt_template, provider_api='openai:chat')","text":"

    Validate model configs and register a new agent. Return new agent's uuid.

    Parameters:

    Name Type Description Default model_config dict

    Model configuration object

    required prompt_template str

    PromptTemplate object

    required provider_api str

    Name of provider and corresponding api eg. 'openai:chat'

    'openai:chat'

    Returns:

    Name Type Description agent_uuid str

    Agent uuid

    "},{"location":"references/controller/#meganno_client.controller.Controller.get_agent_by_uuid","title":"get_agent_by_uuid(agent_uuid)","text":"

    Return agent model configuration, prompt template, and creator id of specified agent.

    Parameters:

    Name Type Description Default agent_uuid str

    Agent uuid

    required

    Returns:

    Type Description dict

    A dict containing agent details.

    "},{"location":"references/controller/#meganno_client.controller.Controller.list_my_agents","title":"list_my_agents()","text":"

    Get the list of registered agents by me.

    Returns:

    Name Type Description agents list

    A list of agents that are created by me.

    "},{"location":"references/controller/#meganno_client.controller.Controller.list_my_jobs","title":"list_my_jobs(show_agent_details=False)","text":"

    Get the list of jobs of issued by me.

    Parameters:

    Name Type Description Default show_agent_details bool

    If True, return agent configuration, by default False

    False

    Returns:

    Name Type Description jobs list

    A list of jobs of issued by me.

    "},{"location":"references/controller/#meganno_client.controller.Controller.run_job","title":"run_job(agent_uuid, subset, label_name, batch_size=1, num_retrials=2, label_meta_names=[], fuzzy_extraction=False)","text":"

    Create, run, and persist an LLM annotation job with given agent and subset.

    Parameters:

    Name Type Description Default agent_uuid str

    Uuid of an agent to be used for the job

    required subset Subset

    [Megagon-only] MEGAnno Subset object to be annotated in the job

    required label_name str

    Label name used for annotation

    required batch_size int

    Size of batch to each Open AI prompt

    1 num_retrials int

    Number of retrials to OpenAI in case of failure in response

    2 label_meta_names

    list of label metadata names to be set

    [] fuzzy_extraction

    Set to True if fuzzy extraction desired in post processing

    False

    Returns:

    Name Type Description job_uuid str

    Job uuid

    "},{"location":"references/openai_job/","title":"OpenAIJob","text":""},{"location":"references/openai_job/#meganno_client.llm_jobs.OpenAIJob","title":"meganno_client.llm_jobs.OpenAIJob","text":"

    The OpenAIJob class handles calls to OpenAI APIs.

    "},{"location":"references/openai_job/#meganno_client.llm_jobs.OpenAIJob.__init__","title":"__init__(label_schema={}, label_names=[], records=[], model_config={}, prompt_template=None)","text":"

    Init function

    Parameters:

    Name Type Description Default label_schema list

    List of label objects

    {} label_names list

    List of label names to be used for annotation

    [] records list

    List of records in [{'data': , 'uuid': }] format

    [] model_config dict

    Parameters for the Open AI model

    {} prompt_template str

    Template based on which prompt to OpenAI is prepared for each record

    None"},{"location":"references/openai_job/#meganno_client.llm_jobs.OpenAIJob.set_openai_api_key","title":"set_openai_api_key(openai_api_key, openai_organization)","text":"

    Set the API keys necessary for call to OpenAI API

    Parameters:

    Name Type Description Default openai_api_key str

    OpenAI API key provided by user

    required openai_organization str[optional]

    OpenAI organization key provided by user

    required"},{"location":"references/openai_job/#meganno_client.llm_jobs.OpenAIJob.validate_openai_api_key","title":"validate_openai_api_key(openai_api_key, openai_organization) staticmethod","text":"

    Validate the OpenAI API and organization keys provided by user

    Parameters:

    Name Type Description Default openai_api_key str

    OpenAI API key provided by user

    required openai_organization str[optional]

    OpenAI organization key provided by user

    required

    Raises:

    Type Description Exception

    If api keys provided by user are invalid, or if any error in calling OpenAI API

    Returns:

    Name Type Description openai_api_key str

    OpenAI API key

    openai_organization str

    OpenAI Organization key

    "},{"location":"references/openai_job/#meganno_client.llm_jobs.OpenAIJob.validate_model_config","title":"validate_model_config(model_config, api_name='chat') staticmethod","text":"

    Validate the LLM model config provided by user. Model should be among the models allowed on MEGAnno, and the parameters should match format specified by Open AI

    Parameters:

    Name Type Description Default model_config dict

    Model specifications such as model name, other parameters eg. temperature, as provided by user

    required api_name str

    Name of OpenAI api eg. \"chat\" or \"completion

    'chat'

    Raises:

    Type Description Exception

    If model is not among the ones provided by MEGAnno, or if configuration format is incorrect

    Returns:

    Name Type Description model_config dict

    Model congigurations

    "},{"location":"references/openai_job/#meganno_client.llm_jobs.OpenAIJob.is_valid_prompt","title":"is_valid_prompt(prompt)","text":"

    Validate the prompt generated. It should not exceed the maximum token limit specified by OpenAI. We use the approximation 1 word ~ 1.33 tokens

    Parameters:

    Name Type Description Default prompt str

    Prompt generated for OpenAI based on template and the record data

    required

    Returns:

    Type Description bool

    True if prompt is valid, False otherwise

    "},{"location":"references/openai_job/#meganno_client.llm_jobs.OpenAIJob.generate_prompts","title":"generate_prompts()","text":"

    Helper function. Given a prompt template and a list of records, generate a list of prompts for each record

    Returns:

    Name Type Description prompts list

    List of tuples of (uuid, generated prompt) for each record in given subset

    "},{"location":"references/openai_job/#meganno_client.llm_jobs.OpenAIJob.get_response_length","title":"get_response_length()","text":"

    Return the length of the openai response

    "},{"location":"references/openai_job/#meganno_client.llm_jobs.OpenAIJob.get_openai_conf_score","title":"get_openai_conf_score()","text":"

    Return confidence score of the label, calculated using average of logit scores

    "},{"location":"references/openai_job/#meganno_client.llm_jobs.OpenAIJob.preprocess","title":"preprocess()","text":"

    Generate the list of prompts for each record based on the subset and template

    Returns:

    Name Type Description prompts list

    List of prompts

    "},{"location":"references/openai_job/#meganno_client.llm_jobs.OpenAIJob.get_llm_annotations","title":"get_llm_annotations(batch_size=1, num_retrials=2, api_name='chat', label_meta_names=[])","text":"

    Call OpenAI using the generated prompts, to obtain valid & invalid responses

    Parameters:

    Name Type Description Default batch_size int

    Size of batch to each Open AI prompt

    1 num_retrials int

    Number of retrials to OpenAI in case of failure in response

    2 api_name str

    Name of OpenAI api eg. \"chat\" or \"completion

    'chat' label_meta_names

    list of label metadata names to be set

    []

    Returns:

    Name Type Description responses list

    List of valid responses from OpenAI

    invalid_responses list

    List of invalid responses from OpenAI

    "},{"location":"references/openai_job/#meganno_client.llm_jobs.OpenAIJob.extract","title":"extract(uuid, response, fuzzy_extraction)","text":"

    Helper function for post-processing. Extract the label (name and value) from the OpenAI response

    Parameters:

    Name Type Description Default uuid str

    Record uuid

    required response str

    Output from OpenAI

    required fuzzy_extraction

    Set to True if fuzzy extraction desired in post processing

    required

    Returns:

    Name Type Description ret dict

    Returns the label name and label value

    "},{"location":"references/openai_job/#meganno_client.llm_jobs.OpenAIJob.post_process_annotations","title":"post_process_annotations(fuzzy_extraction=False)","text":"

    Perform output extraction from the responses generated by LLM, and formats it according to MEGAnno data model.

    Parameters:

    Name Type Description Default fuzzy_extraction

    Set to True if fuzzy extraction desired in post processing

    False

    Returns:

    Name Type Description annotations list

    List of annotations (uuid, label) in format required by MEGAnno

    "},{"location":"references/prompt/","title":"PromptTemplate","text":""},{"location":"references/prompt/#meganno_client.prompt.PromptTemplate","title":"meganno_client.prompt.PromptTemplate","text":"

    The PromptTemplate class represents a prompt template for LLM annotation.

    "},{"location":"references/prompt/#meganno_client.prompt.PromptTemplate.__init__","title":"__init__(label_schema, label_names=[], template='', **kwargs)","text":"

    Init function

    Parameters:

    Name Type Description Default label_schema list

    List of label objects

    required label_names list

    List of label names to be used for annotation, by default []

    [] template str

    Stringified template with input slot, by default ''

    ''"},{"location":"references/prompt/#meganno_client.prompt.PromptTemplate.set_schema","title":"set_schema(label_schema, label_names)","text":"

    A helper function to set schema to be used in prompt template.

    Parameters:

    Name Type Description Default label_schema []

    List of label objects

    required label_names []

    List of label names to be used for annotation, by default all labels

    required"},{"location":"references/prompt/#meganno_client.prompt.PromptTemplate.set_instruction","title":"set_instruction(**kwargs)","text":"

    Update template's task instruction and/or formatting instruction.

    "},{"location":"references/prompt/#meganno_client.prompt.PromptTemplate.build_template","title":"build_template(task_inst, format_inst, f=lambda x: x)","text":"

    A helper function to build template. Return a stringified prompt template with input slot.

    Parameters:

    Name Type Description Default task_inst str

    Task instruction template. Must include '{name}' and '{options}'.

    required format_inst str

    Formatting instruction template. Must include '{format_sample}'.

    required f function

    Use color() to decorate string for print, by default lambda x:x

    lambda x: x

    Returns:

    Name Type Description template str

    Stringified prompt template with input slot

    "},{"location":"references/prompt/#meganno_client.prompt.PromptTemplate.set_template","title":"set_template(**kwargs)","text":"

    Update template by updating task instruction and/or formatting instruction.

    "},{"location":"references/prompt/#meganno_client.prompt.PromptTemplate.get_template","title":"get_template()","text":"

    Return the stringified prompt template with input slot.

    Returns:

    Type Description string

    Stringified prompt template with input slot

    "},{"location":"references/prompt/#meganno_client.prompt.PromptTemplate.get_prompt","title":"get_prompt(input_str: str, **kwargs)","text":"

    Return the prompt for a given input.

    Parameters:

    Name Type Description Default input_str str

    input string to fill input slot

    required

    Returns:

    Name Type Description prompt str

    a prompt template built with given input string

    "},{"location":"references/prompt/#meganno_client.prompt.PromptTemplate.preview","title":"preview(records=[])","text":"

    Open up a widget to modify prompt template and preview final prompt.

    Parameters:

    Name Type Description Default records list

    List of input objects to be used for prompt preview

    []"},{"location":"references/schema/","title":"Schema","text":""},{"location":"references/schema/#meganno_client.schema.Schema","title":"meganno_client.schema.Schema","text":"

    The Schema class defines an annotation schema for a project.

    Attributes:

    Name Type Description __service object

    Service object for the connected project.

    "},{"location":"references/schema/#meganno_client.schema.Schema.set_schemas","title":"set_schemas(schemas=None)","text":"

    Set a user-defined schema

    Parameters:

    Name Type Description Default schemas dict

    Schema of annotation task which defines a label_schema which is a list of Python dictionaries defining the name of the label, the level of the label and options which defines a list of valid label options

    Full Example:

    {\n    \"label_schema\": [\n        {\n            \"name\": \"sentiment\",\n            \"level\": \"record\",\n            \"options\": [\n                {\n                    \"value\": \"pos\",\n                    \"text\": \"positive\"\n                },\n                {\n                    \"value\": \"neg\",\n                    \"text\": \"negative\"\n                }\n            ]\n        },\n\n    ]\n}\n

    None

    Raises:

    Type Description Exception

    If response code is not successful

    Returns:

    Name Type Description response json

    A json of the response

    "},{"location":"references/schema/#meganno_client.schema.Schema.value","title":"value(active=None)","text":"

    Get project schema

    Parameters:

    Name Type Description Default active bool

    If True, only retrieve the active(latest) schema; if False, retrieve all previous schema; if None, retrieve full history.

    None"},{"location":"references/schema/#meganno_client.schema.Schema.get_active_schemas","title":"get_active_schemas()","text":"

    Get the active schema for the project.

    "},{"location":"references/schema/#meganno_client.schema.Schema.get_history","title":"get_history()","text":"

    Get the full history of project schemas

    "},{"location":"references/service/","title":"Service","text":""},{"location":"references/service/#meganno_client.service.Service","title":"meganno_client.service.Service","text":"

    Service objects communicate to back-end MEGAnno services and establish connections to a MEGAnno project.

    "},{"location":"references/service/#meganno_client.service.Service.__init__","title":"__init__(host=None, project=None, token=None, auth=None, port=5000)","text":"

    Init function

    Parameters:

    Name Type Description Default host str

    Host IP address for the back-end service to connect to. If None, connects to a Megagon-hosted service.

    None project str

    Project name. The name needs to be unique within the host domain.

    None token str

    User's authentication token.

    None auth Authentication

    Authentication object. Can be skipped if a valid token is provided.

    None"},{"location":"references/service/#meganno_client.service.Service.show","title":"show(config={})","text":"

    Show project management dashboard in a floating dashboard.

    "},{"location":"references/service/#meganno_client.service.Service.get_service_endpoint","title":"get_service_endpoint(key=None)","text":"

    Get REST endpoint for the connected project. Endpoints are composed from base project url and routes for specific requests.

    Parameters:

    Name Type Description Default key str

    Name of the specific request. Mapping to routes is stored in a dictionary SERVICE_ENDPOINTS in constants.py.

    None"},{"location":"references/service/#meganno_client.service.Service.get_base_payload","title":"get_base_payload()","text":"

    Get the base payload for any REST request which includes the authentication token.

    "},{"location":"references/service/#meganno_client.service.Service.get_schemas","title":"get_schemas()","text":"

    Get schema object for the connected project.

    "},{"location":"references/service/#meganno_client.service.Service.get_statistics","title":"get_statistics()","text":"

    Get the statistics object for the project which supports calculations in the management dashboard.

    "},{"location":"references/service/#meganno_client.service.Service.get_users_by_uids","title":"get_users_by_uids(uids: list = [])","text":"

    Get user names by their unique IDs.

    Parameters:

    Name Type Description Default uids list

    list of unique user IDs.

    []"},{"location":"references/service/#meganno_client.service.Service.get_annotator","title":"get_annotator()","text":"

    Get annotator's own name and user ID. The back-end service distinguishes annotator by the token or auth object used to initialize the connection.

    "},{"location":"references/service/#meganno_client.service.Service.search","title":"search(limit=DEFAULT_LIST_LIMIT, skip=0, uuid_list=None, keyword=None, regex=None, record_metadata_condition=None, annotator_list=None, label_condition=None, label_metadata_condition=None, verification_condition=None)","text":"

    Search the back-end database based on user-provided predicates.

    Parameters:

    Name Type Description Default limit

    The limit of returned records in the subest.

    DEFAULT_LIST_LIMIT skip

    skip index of returned subset (excluding the first skip rows from the raw results ordered by importing order).

    0 uuid_list

    list of record uuids to filter on

    None keyword

    Term for exact keyword searches.

    None regex

    Term for regular expression searches.

    None record_metadata_condition

    {\"name\": # name of the record-level metadata to filter on \"opeartor\": \"==\"|\"<\"|\">\"|\"<=\"|\">=\"|\"exists\", \"value\": # value to complete the expression}

    None annotator_list

    list of annotator names to filter on

    None label_condition

    Label condition of the annotation. {\"name\": # name of the label to filter on \"opeartor\": \"==\"|\"<\"|\">\"|\"<=\"|\">=\"|\"exists\"|\"conflicts\", \"value\": # value to complete the expression}

    None label_metadata_condition

    Label metadata condition of the annotation. Note this can be on different labels than label_condition {\"label_name\": # name of the associated label \"name\": # name of the label-level metadata to filter on \"operator\": \"==\"|\"<\"|\">\"|\"<=\"|\">=\"|\"exists\", \"value\": # value to complete the expression}

    None verification_condition

    verification condition of the annotation. {\"label_name\": # name of the associated label \"search_mode\":\"ALL\"|\"UNVERIFIED\"|\"VERIFIED\"}

    None

    Returns:

    Name Type Description subset Subset

    Subset meeting the search conditions.

    "},{"location":"references/service/#meganno_client.service.Service.deprecate_submit_annotations","title":"deprecate_submit_annotations(subset=None, uuid_list=[])","text":"

    Submit annotations for records in a subset to the back-end service database. Results are filtered to only include annotations owned by the authenticated annotator.

    Parameters:

    Name Type Description Default subset Subset

    The subset object containing records and annotations.

    None uuid_list list

    Additional filter. Only subset records whose uuid are in this list will be submitted.

    []"},{"location":"references/service/#meganno_client.service.Service.submit_annotations","title":"submit_annotations(subset=None, uuid_list=[])","text":"

    Submit annotations for a batch of records in a subset to the back-end service database. Results are filtered to only include annotations owned by the authenticated annotator.

    Parameters:

    Name Type Description Default subset Subset

    The subset object containing records and annotations.

    None uuid_list list

    Additional filter. Only subset records whose uuid are in this list will be submitted.

    []"},{"location":"references/service/#meganno_client.service.Service.import_data_url","title":"import_data_url(url='', file_type=None, column_mapping={})","text":"

    Import data from a public url, currently only supporting csv files. Each row corresponds to a data record. The file needs at least two columns: one with a unique id for each row, and one with the raw data content.

    Parameters:

    Name Type Description Default url str

    Public url for csv file

    '' file_type str

    Currently only supporting type 'CSV'

    None column_mapping dict

    Dictionary with fields id specifying id column name, and content specifying content column name. For example, with a csv file with two columns index and tweet:

    {\n    \"id\": \"index\",\n    \"content\": \"tweet\"\n}\n

    {}"},{"location":"references/service/#meganno_client.service.Service.import_data_df","title":"import_data_df(df, column_mapping={})","text":"

    Import data from a pandas DataFrame. Each row corresponds to a data record. The dataframe needs at least two columns: one with a unique id for each row, and one with the raw data content.

    Parameters:

    Name Type Description Default df DataFrame

    Qualifying dataframe

    required column_mapping dict

    Dictionary with fields id specifying id column name, and content specifying content column name. Using a dataframe, users can import metadata at the same time. For example, with a csv file with two columns index and tweet, and a column location:

    {\n    \"id\": \"index\",\n    \"content\": \"tweet\",\n    \"metadata\": \"location\"\n}\n
    metadata with name location will be created for all imported data records.

    {}"},{"location":"references/service/#meganno_client.service.Service.export","title":"export()","text":"

    Exporting function.

    Returns:

    Name Type Description export_df DataFrame

    A pandas dataframe with columns 'data_id', 'content', 'annotator', 'label_name', 'label_value' for all records in the project

    "},{"location":"references/service/#meganno_client.service.Service.set_metadata","title":"set_metadata(meta_name, func, batch_size=500)","text":"

    Set metadata for all records in the back-end database, based on user-defined function for metadata calculation.

    Parameters:

    Name Type Description Default meta_name str

    Name of the metadata. Will be used to identify and query the metadata.

    required func function(raw_content)

    Function which takes input the raw data content and returns the corresponding metadata (int, string, vectors...).

    required batch_size int

    Batch size for back-end database updates.

    500 Example
    from sentence_transformers import SentenceTransformer\n\nmodel = SentenceTransformer('all-MiniLM-L6-v2')\n# set metadata generation function for service object demo\ndemo.set_metadata(\"bert-embedding\",\n                  lambda x: list(model.encode(x).astype(float)), 500)\n
    "},{"location":"references/service/#meganno_client.service.Service.get_assignment","title":"get_assignment(annotator=None, latest_only=False)","text":"

    Get workload assignment for annotator.

    Parameters:

    Name Type Description Default annotator str

    User ID to query. If set to None, use ID of auth token holder.

    None latest_only bool

    If true, return only the last assignment for the user. Else, return the set of all assigned records.

    False"},{"location":"references/statistic/","title":"Statistic","text":""},{"location":"references/statistic/#meganno_client.statistic.Statistic","title":"meganno_client.statistic.Statistic","text":"

    The Statistic class contains methods to show basic statistics of the labeling project. Mostly used to back views in the monitoring dashboard.

    Attributes:

    Name Type Description __service Service

    Service object for the connected project.

    "},{"location":"references/statistic/#meganno_client.statistic.Statistic.get_label_progress","title":"get_label_progress()","text":"

    Get the overall progress of annotation.

    Returns:

    Name Type Description response dict

    A dictionary with fields total showing total number for data records, and annotated showing number of records with any label from at least one annotator.

    "},{"location":"references/statistic/#meganno_client.statistic.Statistic.get_label_distributions","title":"get_label_distributions(label_name: str = None)","text":"

    Get the class distribution of a selected label. If multiple annotators labeled the same record, aggregate using majority vote.

    Parameters:

    Name Type Description Default label_name str

    Name of label as specified in the schema.

    None

    Returns:

    Name Type Description response dict

    A dictionary showing aggregated class frequencies. Example: {'neg': 60, 'neu': 14, 'pos': 27, 'tied_annotations': 3}. tied_annotation counts numbers of record when there's more than majority voted classes.

    "},{"location":"references/statistic/#meganno_client.statistic.Statistic.get_annotator_contributions","title":"get_annotator_contributions()","text":"

    Get contributions of annotators in terms of records labeled.

    Returns:

    Name Type Description response dict

    A dictionary where keys are annotator IDs and values are total numbers of annotated records by each annotator.

    "},{"location":"references/statistic/#meganno_client.statistic.Statistic.get_annotator_agreements","title":"get_annotator_agreements(label_name: str = None)","text":"

    Get pairwise agreement score between all contributing annotators to the project, on the specified label. The default agreement calculation method is cohen_kappa.

    Parameters:

    Name Type Description Default label_name str

    Name of label as specified in the schema.

    None

    Returns:

    Name Type Description response dict

    A dictionary where keys are pairs of annotator IDs, and values are their agreement scores. The higher the scores are, the more frequent the pairs of annotators agree.

    "},{"location":"references/statistic/#meganno_client.statistic.Statistic.get_embeddings","title":"get_embeddings(label_name: str = None, embed_type: str = None)","text":"

    Return 2-dimensional TSNE projection of the text embedding for data records, together with their aggregated labels (using majority votes). Used for projection view in the monitoring dashboard.

    Parameters:

    Name Type Description Default label_name str

    Name of label as specified in the schema.

    None embed_type str

    the meta_name for the specified embedding

    None

    Returns:

    Name Type Description response dict

    A dictionary with fields agg_label showing aggregated class label, x_axis and y_axis showing projected 2d coordinates.

    "},{"location":"references/subset/","title":"Subset","text":""},{"location":"references/subset/#meganno_client.subset.Subset","title":"meganno_client.subset.Subset","text":"

    The Subset class is used to represent a group of data records

    Attributes:

    Name Type Description __data_uuids list

    List of unique identifiers of data records in the subset.

    __service Service

    Connected backend service

    __my_annotation_list list

    Local cache of the record and annotation view of the subset owned by service.annotator_id. with all possible metadata.

    "},{"location":"references/subset/#meganno_client.subset.Subset.__init__","title":"__init__(service, data_uuids=[], job_id=None)","text":"

    Init function

    Parameters:

    Name Type Description Default service Service

    Service-class object identifying the connected backend service and corresponding data storage

    required data_uuids list

    List of data uuid's to be included in the subset

    []"},{"location":"references/subset/#meganno_client.subset.Subset.get_uuid_list","title":"get_uuid_list()","text":"

    Get list of unique identifiers for all records in the subset.

    Returns:

    Name Type Description __data_uuids list

    List of data uuids included in Subset

    "},{"location":"references/subset/#meganno_client.subset.Subset.value","title":"value(annotator_list: list = None)","text":"

    Check for cached data and annotations of service owner, or retrieve for other annotators (not cached).

    Parameters:

    Name Type Description Default annotator_list list

    if None, retrieve cached own annotator. else, fetch live annotation from others.

    None

    Returns:

    Name Type Description subset_annotation_list list

    See __get_annotation_list for description and example.

    "},{"location":"references/subset/#meganno_client.subset.Subset.get_annotation_by_uuid","title":"get_annotation_by_uuid(uuid)","text":"

    Return the annotation for a particular data record (specified by uuid)

    Parameters:

    Name Type Description Default uuid str

    the uuid for the data record specified by user

    required

    Returns:

    Name Type Description annotation dict

    Annotation for specified data record if it exists else None

    "},{"location":"references/subset/#meganno_client.subset.Subset.show","title":"show(config={})","text":"

    Visualize the current subset in an in-notebook annotation widget.

    Development note: initializing an Annotation widget, creating unique reference to the associated subset and service.

    Parameters:

    Name Type Description Default config dict

    Configuration for default view of the widget.

    - view : \"single\" | \"table\", default \"single\"\n- mode : \"annotating\" | \"reconciling\", default \"annotating\"\n- title: default \"Annotation\"\n- height: default 300 (pixels)\n
    {}"},{"location":"references/subset/#meganno_client.subset.Subset.set_annotations","title":"set_annotations(uuid=None, labels=None)","text":"

    Set the annotation for a particular data record with the specified label

    Parameters:

    Name Type Description Default uuid str

    the uuid for the data record specified by user

    None labels dict

    The labels for the data record at record and span level, with the following structure:

    - \"labels_record\" : list\n    A list of record-level labels\n- \"labels_span\" : list\n    A list of span-level labels\n\nExamples\n-------\n\nExample of setting an annotation with the desired record and span level labels:\n```json\n{\n    \"labels_record\": [\n        {\n            \"label_name\": \"sentiment\",\n            \"label_value\": [\"neu\"]\n        }\n    ],\n\n    \"labels_span\": [\n        {\n            \"label_name\": \"sentiment\",\n            \"label_value\": [\"neu\"],\n            \"start_idx\": 10,\n            \"end_idx\": 20\n        }\n    ]\n}\n```\n
    None

    Raises:

    Type Description Exception

    If uuid or labels is None

    Returns:

    Name Type Description labels dict

    Updated labels for uuid annotated by user

    "},{"location":"references/subset/#meganno_client.subset.Subset.get_reconciliation_data","title":"get_reconciliation_data(uuid_list=None)","text":"

    Return the list of reconciliation data for all data entries specified by user. The reconciliation data for one data record consists of the annotations for it by all annotators

    Parameters:

    Name Type Description Default uuid_list list

    list of uuid's provided by user. If None, use all records in the subset

    None

    Returns:

    Name Type Description reconciliation_data_list list

    List of reconciliation data for each uuid with the following keys: annotation_list which specifies all the annotations for the uuid, data which contains the raw data specified by the uuid, metadata which stores additional information about the data, tokens , and the uuid of the data record Full Example:

    {\n    \"annotation_list\": [\n        {\n            \"annotator\": \"pwOA1N9RKZVJM8VZZ7w8VcT8lp22\",\n            \"labels_record\": [],\n            \"labels_span\": []\n        },\n        {\n            \"annotator\": \"IAzgHOxyeLQBi5QVo7dQR0p2DpA2\",\n            \"labels_record\": [\n                {\n                    \"label_name\": \"sentiment\",\n                    \"label_value\": [\"pos\"]\n                }\n            ],\n            \"labels_span\": []\n        }\n    ],\n    \"data\": \"@united obviously\",\n    \"metadata\": [],\n    \"tokens\": [],\n    \"uuid\": \"ee408271-df5d-435c-af25-72df58a21bfe\"\n}\n
    "},{"location":"references/subset/#meganno_client.subset.Subset.suggest_similar","title":"suggest_similar(record_meta_name, limit=3)","text":"

    For each data record in the subset, suggest more similar data records by retriving the most similar data records from the pool, based on metadata(e.g., embedding) distance.

    Parameters:

    Name Type Description Default record_meta_name str

    The meta-name eg. \"bert-embedding\" for which the similarity is calculated upon.

    required limit int

    The number of matching/similar records desired to be returned. Default is 3

    3

    Raises:

    Type Description Exception

    If response code is not successful

    Returns:

    Name Type Description subset Subset

    A subset of similar data entries

    "},{"location":"references/subset/#meganno_client.subset.Subset.assign","title":"assign(annotator)","text":"

    Assign the current subset as payload to an annotator.

    Parameters:

    Name Type Description Default annotator str

    Annotator ID.

    required"}]} \ No newline at end of file diff --git a/1.5.3/sitemap.xml.gz b/1.5.3/sitemap.xml.gz index 4dd016f..17ba6f6 100644 Binary files a/1.5.3/sitemap.xml.gz and b/1.5.3/sitemap.xml.gz differ