diff --git a/1.5.3/.DS_Store b/1.5.3/.DS_Store new file mode 100644 index 0000000..56f2a9b Binary files /dev/null and b/1.5.3/.DS_Store differ diff --git a/1.5.3/index.html b/1.5.3/index.html index 65f027a..4aa14b1 100644 --- a/1.5.3/index.html +++ b/1.5.3/index.html @@ -672,7 +672,7 @@
-
Figure 1. MEGAnno unique capabilities
MEGAnno provides two key components: (1) a Python client library featuring interactive widgets and (2) a back-end service consisting of web API and database servers. To use our system, a user can interact with a Jupyter Notebook that has the MEGAnno client installed. Through programmatic interfaces and UI widgets, the client communicates with the service. diff --git a/1.5.3/search/search_index.json b/1.5.3/search/search_index.json index b012599..3985a5c 100644 --- a/1.5.3/search/search_index.json +++ b/1.5.3/search/search_index.json @@ -1 +1 @@ -{"config":{"lang":["en"],"separator":"[\\s\\-]+","pipeline":["stopWordFilter"]},"docs":[{"location":"","title":"Welcome to MEGAnno documentation","text":""},{"location":"#how-to-get-started","title":"How to get started?","text":"
There are 2 ways to get started with MEGAnno:
1. Demo system access: We prepared a Google Colab notebook for this demo. To run the Colab notebook, you\u2019ll need a Google account, an OpenAI API key, and a MEGAnno access token (you can get this by filling out the request form).
2. Your own MEGAnno environment: To set up MEGAnno for your own projects, you can set up your own self-hosted MEGAnno service. Please follow the self-hosted installation instructions.
"},{"location":"#what-is-meganno","title":"What is MEGAnno?","text":"Many existing data annotation tools focus on the annotator enabling them to annotate data and manage annotation activities. Instead, MEGAnno is an open-source data annotation tool that puts the data scientist first, enabling you to bootstrap annotation tasks and manage the continual evolution of annotations through the machine learning lifecycle.
In addition, MEGAnno\u2019s unique capabilities include:
A back-end service that acts as a single source of truth and stores/manages all the evolution of annotation information through the lifecycle.
Power tools to explore data sets and select the best data to label. Accommodations for active learning and other techniques to prioritize your labeling work.
Explore the distribution of labels and the behavior of annotators to make decisions for subsequent labeling batches.
A data scientist-focused experience enabling you to manage annotation directly in your notebooks. This allows you to utilize existing Python functions and our built-in power tools to optimize your annotation process.
Figure 1. MEGAnno unique capabilities
"},{"location":"#system-overview","title":"System Overview","text":"MEGAnno provides two key components: (1) a Python client library featuring interactive widgets and (2) a back-end service consisting of web API and database servers. To use our system, a user can interact with a Jupyter Notebook that has the MEGAnno client installed. Through programmatic interfaces and UI widgets, the client communicates with the service. Figure 2. Overview of MEGAnno+ system.
Please see the Getting Started page for setup instructions and the Advanced Features page for more cool features we provide.
"},{"location":"#references","title":"References","text":"@inproceedings{kim-etal-2024-meganno,\n title = \"{MEGA}nno+: A Human-{LLM} Collaborative Annotation System\",\n author = \"Kim, Hannah and Mitra, Kushan and Li Chen, Rafael and Rahman, Sajjadur and Zhang, Dan\",\n editor = \"Aletras, Nikolaos and De Clercq, Orphee\",\n booktitle = \"Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations\",\n month = mar,\n year = \"2024\",\n address = \"St. Julians, Malta\",\n publisher = \"Association for Computational Linguistics\",\n url = \"https://aclanthology.org/2024.eacl-demo.18\",\n pages = \"168--176\",\n}\n
@inproceedings{zhang-etal-2022-meganno,\n title = \"{MEGA}nno: Exploratory Labeling for {NLP} in Computational Notebooks\",\n author = \"Zhang, Dan and Kim, Hannah and Li Chen, Rafael and Kandogan, Eser and Hruschka, Estevam\",\n editor = \"Dragut, Eduard and Li, Yunyao and Popa, Lucian and Vucetic, Slobodan and Srivastava, Shashank\",\n booktitle = \"Proceedings of the Fourth Workshop on Data Science with Human-in-the-Loop (Language Advances)\",\n month = dec,\n year = \"2022\",\n address = \"Abu Dhabi, United Arab Emirates (Hybrid)\",\n publisher = \"Association for Computational Linguistics\",\n url = \"https://aclanthology.org/2022.dash-1.1\",\n pages = \"1--7\",\n}\n
"},{"location":"advanced/","title":"Advanced features","text":"This notebook provides examples of some of the advanced features.
"},{"location":"advanced/#updating-schema","title":"Updating Schema","text":"Annotation requirements can change as projects evolve. To update the schema for a project, simply call set_schemas
with the new schema object. For example, to expand the schema we set in the basic notebook:
demo.get_schemas().set_schemas({\n \"label_schema\": [\n {\n \"name\": \"sentiment\",\n \"level\": \"record\", \n \"options\": [\n { \"value\": \"pos\", \"text\": \"positive\" },\n { \"value\": \"neg\", \"text\": \"negative\" },\n { \"value\": \"neu\", \"text\": \"neutral\" } # adding a new option\n ]\n },\n # adding a span-level label\n {\n \"name\": \"sp\",\n \"level\": \"span\", \n \"options\": [\n { \"value\": \"pos\", \"text\": \"positive\" },\n { \"value\": \"neg\", \"text\": \"negative\" },\n ]\n }\n ]\n})\n
Only the latest schema will be active, but all previous ones will be preserved. To see the full history: demo.get_schemas().get_history()\n
"},{"location":"advanced/#metadata","title":"Metadata","text":"In MEGAnno, metadata refers to auxiliary information associated with data records. MEGAnno takes user-defined functions to generate metadata and uses it to find important subsets and assist human annotators. Here we show two examples.
Example 1: Adding sentence bert embeddings for data records. The embeddings can later be used to make similarity computations over records.
# Example 1, adding sentence-bert embedding.\nfrom sentence_transformers import SentenceTransformer\nmodel = SentenceTransformer(\"all-MiniLM-L6-v2\")\n# set metadata generation function \ndemo.set_metadata(\"bert-embedding\",lambda x: list(model.encode(x).astype(float)), 500)\n
Example 2: Extracting hashtags as annotation context.
# user defined function to extract hashtag\ndef extract_hashtags(text):\n hashtag_list = []\n for word in text.split():\n if word[0] == \"#\":\n hashtag_list.append(word[:])\n # widget can render markdown text\n return \"\".join([\"- {}\\n\".format(x) for x in hashtag_list])\n\n# apply metadata to the project\ndemo.set_metadata(\"hashtag\", lambda x: extract_hashtags(x), 500)\n
With hashtag
metadata, MEGAnno widget can show it as context at annotation time.
s1= demo.search(keyword=\"\", limit=50, skip=0, meta_names=[\"hashtag\"])\ns1.show()\n
"},{"location":"advanced/#advanced-subset-generation","title":"Advanced Subset Generation","text":"In addition to exact keyword matches, MEGAnno also provides more advanced approaches of generating subsets.
"},{"location":"advanced/#regex-based-searches","title":"Regex-based Searches","text":"MEGAnno supports searches based on regular expressions:
s2_reg= demo.search(regex=\".* (delay) .*\", limit=50, skip=0)\ns2_reg.show({\"view\": \"table\"})\n
"},{"location":"advanced/#subset-suggestion","title":"Subset Suggestion","text":"Searches initiated by users can help them explore the dataset in a controlled way. Still, the quality of searches is only as good as users\u2019 knowledge about the data and domain. MEGAnno provides an automated subset suggestion engine to assist with exploration. Embedding-based suggestions make suggestions based on data-embedding vectors provided by the user (as metadata).
For example, suggest_similar suggests neighbors (based on distance in the embedding space) of data in the querying subset:
s3 = demo.search(keyword=\"delay\", limit=3, skip=0) # source subset\ns4 = s3.suggest_similar(\"bert-embedding\", limit=4) # needs to provide a valid meta_name\ns4.show()\n
"},{"location":"advanced/#subset-operations","title":"Subset Operations","text":"MEGAnno supports set operations to build more subsets from others:
# intersection\ns_intersection = s1 & s2 # or s1.intersection(s2)\n# union\ns_union = s1 | s2 # or s1.union(s2)\n# difference\ns_diff = s1 - s2 # or s1.difference(s2)\n
"},{"location":"advanced/#dashboard-administrator-only","title":"Dashboard (administrator-only)","text":"MEGAnno provides a built-in visual monitoring dashboard to help users to get real-time status of the annotation project. As projects evolve, users would often need to understand the project\u2019s status to make decisions about the next steps, like collecting more data points with certain characteristics or adding a new class to the task definition. To aid such analysis, the dashboard widget packs common statistics and analytical visualizations (e.g., annotation progress, distribution of labels, annotator agreement, etc.) based on a survey of our pilot users.
To bring up the project dashboard:
demo.show()\n
Other features
Assignment and dispatch: You may assign a subset to a particular annotator
s1.assign(annotator_id)\n
Multiple annotators and reconciliation: You are also able to view a reconciled list of annotations from multiple annotators
s1.get_reconciliation_data()\n
Please also refer to this notebook for a running example of the basic pipeline of using MEGAnno in a notebook.
"},{"location":"basic/#setting-schema","title":"Setting Schema","text":"Schema defines the annotation task. Example of setting schema for a sentiment analysis task with positive and negative options.
demo.get_schemas().set_schemas({\n \"label_schema\": [\n {\n \"name\": \"sentiment\",\n \"level\": \"record\", \n \"options\": [\n { \"value\": \"pos\", \"text\": \"positive\" },\n { \"value\": \"neg\", \"text\": \"negative\" },\n ]\n }\n ]\n})\ndemo.get_schemas().value(active=True) \n
A label can be defined to have level record
or span
. Record-level labels correspond to the entire data record, while span-level labels are associated with a text span in the record. See Updating Schema for an example of a more complex schema."},{"location":"basic/#importing-data","title":"Importing Data","text":"Given a pandas dataframe like this (example generated from this Twitter US Airline Sentiment dataset):
id tweet 0 @united how else would I know it was denied? 1 @JetBlue my SIL bought tix for us to NYC. We were told at the gate that her cc was declined. Supervisor accused us of illegal activity. 2 @JetBlue dispatcher keeps yelling and hung up on me!Importing data is easy by providing column names for id
which is a unique importing identifier for data records, and content
which is the raw text field.
demo.import_data_df(df, column_mapping={\n \"id\": \"id\",\n \"content\": \"tweet\"\n})\n
"},{"location":"basic/#exploratory-labeling","title":"Exploratory Labeling","text":"Not all data points are equally important for downstream models and applications. There are often cases where users might want to prioritize a particular batch (e.g., to achieve better class or domain coverage or focus on the data points that the downstream model cannot predict well). MEGAnno provides a flexible and controllable way of organizing annotation projects through the exploratory labeling. This annotation process is done by first identifying an interesting subset and assigning labels to data in the subset. We provide a set of \u201cpower tools\u201d to help identify valuable subsets.
The script below shows an example of searching for data records with keyword \"delay\" and bringing up a widget for annotation in the next cell. More examples here.
# search results => subset s1\ns1 = demo.search(keyword=\"delay\", limit=10, skip=0)\n# bring up a widget \ns1.show()\n
"},{"location":"basic/#column-filters","title":"Column Filters","text":"To view all column filters, click on \"Filters\" button; to reset all column filters, click on \"Reset filters\" button.
"},{"location":"basic/#column-order-visibility","title":"Column Order & Visibility","text":"1. To re-order and re-size column, mouse over column drag handler (left grip handler for re-order and right column edge for re-size). 2. To toggle column visiblity, click on \"Columns\", then toggle column to show/hide. 3. To reset column ordering and visibility, click on \"Reset columns\" button.
"},{"location":"basic/#metadata-focus-view","title":"Metadata Focus-view","text":"To focus on a single metadata value, click on \"Settings\" button, then choose a metadata name from the list.
"},{"location":"basic/#exporting","title":"Exporting","text":"Although iterations can happen within a single notebook, it's easy to export the data, and annotations collected:
# collecting the annotation generated by all annotators\ndemo.export()\n
"},{"location":"llm_integration/","title":"LLM Integration","text":"This notebook provides an example workflow of utilizing LLMs as annotation agents within MEGAnno.
Figure 1. Human-LLM collaborative workflow.
MEGAnno offers a simple human-LLM collaborative annotation workflow: LLM annotation followed by human verification. Put simply, LLM agents label data first (Figure 1, step \u2460), and humans verify LLM labels as needed. For most tasks and datasets one can use LLM labels as is; for some subset of difficult or uncertain instances (Figure 1, step \u2461), humans can verify LLM labels \u2013 confirm the right ones and correct the wrong ones (Figure 1, step \u2462). In this way, the LLM annotation part can be automated, and human efforts can be directed to where they are most needed to improve the quality of final labels.
An overview of the entire system and key concepts are shown below.
Figure 2. Overview of MEGAnno+ system.
Subset: refers to a slice of data created from user-defined searches.
Record: refers to an item within the data corpus.
Agent: an Agent is defined by the configuration of the LLM (e.g., model\u2019s name, version, and hyper-parameters) and a prompt template.
Job: when an Agent is employed to annotate a selected data Subset, the execution is referred to as a Job.
Label: stores the label assigned to a particular Record
Label_Metadata: captures additional aspects of a label, such as LLM confidence score or length of label response, etc.
Verification: captures annotations from human users that confirm or update LLM labels
"},{"location":"llm_integration/#llm-annotation","title":"LLM Annotation","text":"MEGAnno achieves LLM annotation in three steps, as shown in the figure below.
Figure 3. Steps in the LLM annotation workflow.
The preprocessing step handles the generation of prompts and validation of model configuration. Users can specify a particular LLM model, define its configurations and customize a prompt template (Figure 4). This defines an Agent which can be used for the annotation task. Registered Agents can be reused later.
Figure 4. Prompt Template UI. Users can customize task instructions and preview generated prompts.
After the selected model configuration is validated, the next step is calling the LLM. MEGAnno handles the call to the external LLM API to obtain LLM responses. Any API errors encountered during the call are also appropriately handled and a suitable message is relayed to the user.
Once the responses are obtained, the post-processing step extracts the label from the LLM response. Our post-processing step ensures some minor deviations in the LLM's response (such as trailing period) are handled. Furthermore, users can set fuzzy_extraction=True
which performs a fuzzy match between the LLM response and the label schema space, and if a significant match is found the corresponding label is attributed for the task. The figure below shows how MEGAnno's post-processing mechanism handles several LLM responses.
Figure 5. Example LLM responses and post-processing results by MEGAnno.
"},{"location":"llm_integration/#verification-subset-selection","title":"Verification Subset Selection","text":"It would be redundant for a human to verify every annotation in the dataset as that would defeat the purpose of using LLMs for a cheap and faster annotation process. Instead, MEGAnno provides a possibility to aid the human verifiers by computing confidence scores for each annotation. Users can specify confidence_score
of the LLM labels to be computed and stored. They can then view the confidence scores, and even sort as well as filter over them to obtain only those annotations for which the LLM had low confidence scores. This will ease the human verification process and make it more efficient.
Users can then use MEGAnno's in-notebook widget to verify LLM labels i.e., either confirm a label as correct or reject the label and specify a correct label. Users may view the final annotations and export the data for downstream tasks or further analysis.
Figure 6. Verification UI for exploring data and confirming/correcting LLM labels.
"},{"location":"quickstart/","title":"Getting Started","text":""},{"location":"quickstart/#installation","title":"Installation","text":"We have 2 ways to authenticate with the service:
Short-term 1 hour access with username and password sign in.
After executing auth = Authentication(project=\"<project_name>\")
(this only works for notebook and terminal running on local computer), you will be provided with a sign in interface via a new browser tab.
After signing in, you will be able to generate a long-term personal access token by running auth.create_access_token(expiration_duration=7, note=\"testing\")
expiration_duration
is in days.expiration_duration
to 0 (under the hood, it still expires after 100 years).Long-term access with access token without signing in every time.
auth = Authentication(project=\"<project_name>\", token=\"<your_token>\")\n
MEGAnno supports 2 types of user roles: Admin and Contributor. Admin users are project owners deploying the services; they have full access to the project such as importing data or updating schemas. Admin users can invite contributors by sharing invitation code(s) with them. Contributors can only access their own annotation namespace and cannot modify the project.
To invite contributors, follow the instructions below:
from meganno_client import Admin\ntoken = \"...\"\nauth = Authentication(project=\"<project_name>\", token=token)\n\nadmin = Admin(project=\"eacl_demo\", auth=auth)\n# OR\nadmin = Admin(project=\"eacl_demo\", token=token)\n
admin.create_invitation(single_use=True, code=\"<invitation_code>\", role_code=\"contributor\")\n
admin.get_invitations()\nadmin.renew_invitation(id=\"<invitation_code_id>\")\nadmin.revoke_invitation(id=\"<invitation_code_id>\")\n
auth = Authentication(project=\"<project_name>\")
, a new browser tab will present itself.GET
POST
/agents administrator
contributor
GET
/agents/jobs /agents/<string:agent_uuid>/jobs GET
POST
/agents/<string:agent_uuid>/jobs/<string:job_uuid> /annotations/<string:record_uuid> administrator
contributor
job
POST
/annotations/batch /annotations/<string:record_uuid>/labels administrator
contributor
/annotations/label_metadata administrator
contributor
job
GET
POST
/assignments administrator
contributor
POST
/data /data/metadata administrator
GET
/data/export /data/suggest_similar administrator
contributor
GET
/schemas administrator
contributor
job
POST
administrator
POST
/verifications/<string:record_uuid>/labels administrator
contributor
GET
/annotations /view/record /view/annotation /view/verifications administrator
contributor
job
/reconciliations administrator
contributor
GET
/statistics/annotator/contributions /statistics/annotator/agreements /statistics/embeddings/<embed_type> /statistics/label/progress /statistics/label/distributions administrator
GET
POST
PUT
DELETE
/invitations administrator
GET
/invitations/<invitation_code> GET
POST
DELETE
/tokens administrator
contributor
"},{"location":"references/controller/","title":"Controller","text":""},{"location":"references/controller/#meganno_client.controller.Controller","title":"meganno_client.controller.Controller
","text":"The Controller class manages annotation agents and runs agent jobs.
"},{"location":"references/controller/#meganno_client.controller.Controller.__init__","title":"__init__(service, auth)
","text":"Init function
Parameters:
Name Type Description Defaultservice
Service
MEGAnno service object for the connected project.
requiredauth
Authentication
MEGAnno authentication object.
required"},{"location":"references/controller/#meganno_client.controller.Controller.list_agents","title":"list_agents(created_by_filter=None, provider_filter=None, api_filter=None, show_job_list=False)
","text":"Get the list of registered agents by their issuer IDs.
Parameters:
Name Type Description Defaultcreated_by_filter
list
List of user IDs to filter agents, by default None (if None, list all)
None
provider_filter
Returns agents with the specified provider eg. openai
None
api_filter
Returns agents with the specified api eg. completion
None
show_job_list
if True, also return the list uuids of jobs of the agent.
False
Returns:
Type Descriptionlist
A list of agents that are created by specified issuers.
"},{"location":"references/controller/#meganno_client.controller.Controller.list_jobs","title":"list_jobs(filter_by, filter_values, show_agent_details=False)
","text":"Get the list of jobs with querying filters.
Parameters:
Name Type Description Defaultfilter_by
str
Filter options. Must be [\"agent_uuid\" | \"issued_by\" | \"uuid\"] | None
requiredfilter_values
list
List of uuids of entity specified in 'filter_by'
requiredshow_agent_details
bool
If True, return agent configuration, by default False
False
Returns:
Type Descriptionlist
A list of jobs that match given filtering criteria.
"},{"location":"references/controller/#meganno_client.controller.Controller.list_jobs_of_agent","title":"list_jobs_of_agent(agent_uuid, show_agent_details=False)
","text":"Get the list of jobs of a given agent.
Parameters:
Name Type Description Defaultagent_uuid
str
Agent uuid
requiredshow_agent_details
bool
If True, return agent configuration, by default False
False
Returns:
Type Descriptionlist
A list of jobs of a given agent
"},{"location":"references/controller/#meganno_client.controller.Controller.register_agent","title":"register_agent(model_config, prompt_template_str, provider_api)
","text":"Register an agent with backend service.
Parameters:
Name Type Description Defaultmodel_config
dict
Model configuration object
requiredprompt_template_str
str
Serialized prompt template
requiredprovider_api
str
Name of provider and corresponding api eg. 'openai:chat'
requiredReturns:
Type Descriptiondict
object with unique agent id.
"},{"location":"references/controller/#meganno_client.controller.Controller.persist_job","title":"persist_job(agent_uuid, job_uuid, label_name, annotation_uuid_list)
","text":"Given annoations for a subset, persist them as a job for the project.
Parameters:
Name Type Description Defaultagent_uuid
str
Agent uuid
requiredjob_uuid
str
Job uuid
requiredlabel_name
str
Label name used for annotation
requiredannotation_uuid_list
list
List of uuids of records that have valid annotations from the job
requiredReturns:
Type Descriptiondict
Object with job uuid and annotation count
"},{"location":"references/controller/#meganno_client.controller.Controller.create_agent","title":"create_agent(model_config, prompt_template, provider_api='openai:chat')
","text":"Validate model configs and register a new agent. Return new agent's uuid.
Parameters:
Name Type Description Defaultmodel_config
dict
Model configuration object
requiredprompt_template
str
PromptTemplate object
requiredprovider_api
str
Name of provider and corresponding api eg. 'openai:chat'
'openai:chat'
Returns:
Name Type Descriptionagent_uuid
str
Agent uuid
"},{"location":"references/controller/#meganno_client.controller.Controller.get_agent_by_uuid","title":"get_agent_by_uuid(agent_uuid)
","text":"Return agent model configuration, prompt template, and creator id of specified agent.
Parameters:
Name Type Description Defaultagent_uuid
str
Agent uuid
requiredReturns:
Type Descriptiondict
A dict containing agent details.
"},{"location":"references/controller/#meganno_client.controller.Controller.list_my_agents","title":"list_my_agents()
","text":"Get the list of registered agents by me.
Returns:
Name Type Descriptionagents
list
A list of agents that are created by me.
"},{"location":"references/controller/#meganno_client.controller.Controller.list_my_jobs","title":"list_my_jobs(show_agent_details=False)
","text":"Get the list of jobs of issued by me.
Parameters:
Name Type Description Defaultshow_agent_details
bool
If True, return agent configuration, by default False
False
Returns:
Name Type Descriptionjobs
list
A list of jobs of issued by me.
"},{"location":"references/controller/#meganno_client.controller.Controller.run_job","title":"run_job(agent_uuid, subset, label_name, batch_size=1, num_retrials=2, label_meta_names=[], fuzzy_extraction=False)
","text":"Create, run, and persist an LLM annotation job with given agent and subset.
Parameters:
Name Type Description Defaultagent_uuid
str
Uuid of an agent to be used for the job
requiredsubset
Subset
[Megagon-only] MEGAnno Subset object to be annotated in the job
requiredlabel_name
str
Label name used for annotation
requiredbatch_size
int
Size of batch to each Open AI prompt
1
num_retrials
int
Number of retrials to OpenAI in case of failure in response
2
label_meta_names
list of label metadata names to be set
[]
fuzzy_extraction
Set to True if fuzzy extraction desired in post processing
False
Returns:
Name Type Descriptionjob_uuid
str
Job uuid
"},{"location":"references/openai_job/","title":"OpenAIJob","text":""},{"location":"references/openai_job/#meganno_client.llm_jobs.OpenAIJob","title":"meganno_client.llm_jobs.OpenAIJob
","text":"The OpenAIJob class handles calls to OpenAI APIs.
"},{"location":"references/openai_job/#meganno_client.llm_jobs.OpenAIJob.__init__","title":"__init__(label_schema={}, label_names=[], records=[], model_config={}, prompt_template=None)
","text":"Init function
Parameters:
Name Type Description Defaultlabel_schema
list
List of label objects
{}
label_names
list
List of label names to be used for annotation
[]
records
list
List of records in [{'data': , 'uuid': }] format
[]
model_config
dict
Parameters for the Open AI model
{}
prompt_template
str
Template based on which prompt to OpenAI is prepared for each record
None
"},{"location":"references/openai_job/#meganno_client.llm_jobs.OpenAIJob.set_openai_api_key","title":"set_openai_api_key(openai_api_key, openai_organization)
","text":"Set the API keys necessary for call to OpenAI API
Parameters:
Name Type Description Defaultopenai_api_key
str
OpenAI API key provided by user
requiredopenai_organization
str[optional]
OpenAI organization key provided by user
required"},{"location":"references/openai_job/#meganno_client.llm_jobs.OpenAIJob.validate_openai_api_key","title":"validate_openai_api_key(openai_api_key, openai_organization)
staticmethod
","text":"Validate the OpenAI API and organization keys provided by user
Parameters:
Name Type Description Defaultopenai_api_key
str
OpenAI API key provided by user
requiredopenai_organization
str[optional]
OpenAI organization key provided by user
requiredRaises:
Type DescriptionException
If api keys provided by user are invalid, or if any error in calling OpenAI API
Returns:
Name Type Descriptionopenai_api_key
str
OpenAI API key
openai_organization
str
OpenAI Organization key
"},{"location":"references/openai_job/#meganno_client.llm_jobs.OpenAIJob.validate_model_config","title":"validate_model_config(model_config, api_name='chat')
staticmethod
","text":"Validate the LLM model config provided by user. Model should be among the models allowed on MEGAnno, and the parameters should match format specified by Open AI
Parameters:
Name Type Description Defaultmodel_config
dict
Model specifications such as model name, other parameters eg. temperature, as provided by user
requiredapi_name
str
Name of OpenAI api eg. \"chat\" or \"completion
'chat'
Raises:
Type DescriptionException
If model is not among the ones provided by MEGAnno, or if configuration format is incorrect
Returns:
Name Type Descriptionmodel_config
dict
Model congigurations
"},{"location":"references/openai_job/#meganno_client.llm_jobs.OpenAIJob.is_valid_prompt","title":"is_valid_prompt(prompt)
","text":"Validate the prompt generated. It should not exceed the maximum token limit specified by OpenAI. We use the approximation 1 word ~ 1.33 tokens
Parameters:
Name Type Description Defaultprompt
str
Prompt generated for OpenAI based on template and the record data
requiredReturns:
Type Descriptionbool
True if prompt is valid, False otherwise
"},{"location":"references/openai_job/#meganno_client.llm_jobs.OpenAIJob.generate_prompts","title":"generate_prompts()
","text":"Helper function. Given a prompt template and a list of records, generate a list of prompts for each record
Returns:
Name Type Descriptionprompts
list
List of tuples of (uuid, generated prompt) for each record in given subset
"},{"location":"references/openai_job/#meganno_client.llm_jobs.OpenAIJob.get_response_length","title":"get_response_length()
","text":"Return the length of the openai response
"},{"location":"references/openai_job/#meganno_client.llm_jobs.OpenAIJob.get_openai_conf_score","title":"get_openai_conf_score()
","text":"Return confidence score of the label, calculated using average of logit scores
"},{"location":"references/openai_job/#meganno_client.llm_jobs.OpenAIJob.preprocess","title":"preprocess()
","text":"Generate the list of prompts for each record based on the subset and template
Returns:
Name Type Descriptionprompts
list
List of prompts
"},{"location":"references/openai_job/#meganno_client.llm_jobs.OpenAIJob.get_llm_annotations","title":"get_llm_annotations(batch_size=1, num_retrials=2, api_name='chat', label_meta_names=[])
","text":"Call OpenAI using the generated prompts, to obtain valid & invalid responses
Parameters:
Name Type Description Defaultbatch_size
int
Size of batch to each Open AI prompt
1
num_retrials
int
Number of retrials to OpenAI in case of failure in response
2
api_name
str
Name of OpenAI api eg. \"chat\" or \"completion
'chat'
label_meta_names
list of label metadata names to be set
[]
Returns:
Name Type Descriptionresponses
list
List of valid responses from OpenAI
invalid_responses
list
List of invalid responses from OpenAI
"},{"location":"references/openai_job/#meganno_client.llm_jobs.OpenAIJob.extract","title":"extract(uuid, response, fuzzy_extraction)
","text":"Helper function for post-processing. Extract the label (name and value) from the OpenAI response
Parameters:
Name Type Description Defaultuuid
str
Record uuid
requiredresponse
str
Output from OpenAI
requiredfuzzy_extraction
Set to True if fuzzy extraction desired in post processing
requiredReturns:
Name Type Descriptionret
dict
Returns the label name and label value
"},{"location":"references/openai_job/#meganno_client.llm_jobs.OpenAIJob.post_process_annotations","title":"post_process_annotations(fuzzy_extraction=False)
","text":"Perform output extraction from the responses generated by LLM, and formats it according to MEGAnno data model.
Parameters:
Name Type Description Defaultfuzzy_extraction
Set to True if fuzzy extraction desired in post processing
False
Returns:
Name Type Descriptionannotations
list
List of annotations (uuid, label) in format required by MEGAnno
"},{"location":"references/prompt/","title":"PromptTemplate","text":""},{"location":"references/prompt/#meganno_client.prompt.PromptTemplate","title":"meganno_client.prompt.PromptTemplate
","text":"The PromptTemplate class represents a prompt template for LLM annotation.
"},{"location":"references/prompt/#meganno_client.prompt.PromptTemplate.__init__","title":"__init__(label_schema, label_names=[], template='', **kwargs)
","text":"Init function
Parameters:
Name Type Description Defaultlabel_schema
list
List of label objects
requiredlabel_names
list
List of label names to be used for annotation, by default []
[]
template
str
Stringified template with input slot, by default ''
''
"},{"location":"references/prompt/#meganno_client.prompt.PromptTemplate.set_schema","title":"set_schema(label_schema, label_names)
","text":"A helper function to set schema to be used in prompt template.
Parameters:
Name Type Description Defaultlabel_schema
[]
List of label objects
requiredlabel_names
[]
List of label names to be used for annotation, by default all labels
required"},{"location":"references/prompt/#meganno_client.prompt.PromptTemplate.set_instruction","title":"set_instruction(**kwargs)
","text":"Update template's task instruction and/or formatting instruction.
"},{"location":"references/prompt/#meganno_client.prompt.PromptTemplate.build_template","title":"build_template(task_inst, format_inst, f=lambda x: x)
","text":"A helper function to build template. Return a stringified prompt template with input slot.
Parameters:
Name Type Description Defaulttask_inst
str
Task instruction template. Must include '{name}' and '{options}'.
requiredformat_inst
str
Formatting instruction template. Must include '{format_sample}'.
requiredf
function
Use color() to decorate string for print, by default lambda x:x
lambda x: x
Returns:
Name Type Descriptiontemplate
str
Stringified prompt template with input slot
"},{"location":"references/prompt/#meganno_client.prompt.PromptTemplate.set_template","title":"set_template(**kwargs)
","text":"Update template by updating task instruction and/or formatting instruction.
"},{"location":"references/prompt/#meganno_client.prompt.PromptTemplate.get_template","title":"get_template()
","text":"Return the stringified prompt template with input slot.
Returns:
Type Descriptionstring
Stringified prompt template with input slot
"},{"location":"references/prompt/#meganno_client.prompt.PromptTemplate.get_prompt","title":"get_prompt(input_str: str, **kwargs)
","text":"Return the prompt for a given input.
Parameters:
Name Type Description Defaultinput_str
str
input string to fill input slot
requiredReturns:
Name Type Descriptionprompt
str
a prompt template built with given input string
"},{"location":"references/prompt/#meganno_client.prompt.PromptTemplate.preview","title":"preview(records=[])
","text":"Open up a widget to modify prompt template and preview final prompt.
Parameters:
Name Type Description Defaultrecords
list
List of input objects to be used for prompt preview
[]
"},{"location":"references/schema/","title":"Schema","text":""},{"location":"references/schema/#meganno_client.schema.Schema","title":"meganno_client.schema.Schema
","text":"The Schema class defines an annotation schema for a project.
Attributes:
Name Type Description__service
object
Service object for the connected project.
"},{"location":"references/schema/#meganno_client.schema.Schema.set_schemas","title":"set_schemas(schemas=None)
","text":"Set a user-defined schema
Parameters:
Name Type Description Defaultschemas
dict
Schema of annotation task which defines a label_schema
which is a list of Python dictionaries defining the name
of the label, the level
of the label and options
which defines a list of valid label options
Full Example:
{\n \"label_schema\": [\n {\n \"name\": \"sentiment\",\n \"level\": \"record\",\n \"options\": [\n {\n \"value\": \"pos\",\n \"text\": \"positive\"\n },\n {\n \"value\": \"neg\",\n \"text\": \"negative\"\n }\n ]\n },\n\n ]\n}\n
None
Raises:
Type DescriptionException
If response code is not successful
Returns:
Name Type Descriptionresponse
json
A json of the response
"},{"location":"references/schema/#meganno_client.schema.Schema.value","title":"value(active=None)
","text":"Get project schema
Parameters:
Name Type Description Defaultactive
bool
If True
, only retrieve the active(latest) schema; if False
, retrieve all previous schema; if None
, retrieve full history.
None
"},{"location":"references/schema/#meganno_client.schema.Schema.get_active_schemas","title":"get_active_schemas()
","text":"Get the active schema for the project.
"},{"location":"references/schema/#meganno_client.schema.Schema.get_history","title":"get_history()
","text":"Get the full history of project schemas
"},{"location":"references/service/","title":"Service","text":""},{"location":"references/service/#meganno_client.service.Service","title":"meganno_client.service.Service
","text":"Service objects communicate to back-end MEGAnno services and establish connections to a MEGAnno project.
"},{"location":"references/service/#meganno_client.service.Service.__init__","title":"__init__(host=None, project=None, token=None, auth=None, port=5000)
","text":"Init function
Parameters:
Name Type Description Defaulthost
str
Host IP address for the back-end service to connect to. If None, connects to a Megagon-hosted service.
None
project
str
Project name. The name needs to be unique within the host domain.
None
token
str
User's authentication token.
None
auth
Authentication
Authentication object. Can be skipped if a valid token is provided.
None
"},{"location":"references/service/#meganno_client.service.Service.show","title":"show(config={})
","text":"Show project management dashboard in a floating dashboard.
"},{"location":"references/service/#meganno_client.service.Service.get_service_endpoint","title":"get_service_endpoint(key=None)
","text":"Get REST endpoint for the connected project. Endpoints are composed from base project url and routes for specific requests.
Parameters:
Name Type Description Defaultkey
str
Name of the specific request. Mapping to routes is stored in a dictionary SERVICE_ENDPOINTS
in constants.py
.
None
"},{"location":"references/service/#meganno_client.service.Service.get_base_payload","title":"get_base_payload()
","text":"Get the base payload for any REST request which includes the authentication token.
"},{"location":"references/service/#meganno_client.service.Service.get_schemas","title":"get_schemas()
","text":"Get schema object for the connected project.
"},{"location":"references/service/#meganno_client.service.Service.get_statistics","title":"get_statistics()
","text":"Get the statistics object for the project which supports calculations in the management dashboard.
"},{"location":"references/service/#meganno_client.service.Service.get_users_by_uids","title":"get_users_by_uids(uids: list = [])
","text":"Get user names by their unique IDs.
Parameters:
Name Type Description Defaultuids
list
list of unique user IDs.
[]
"},{"location":"references/service/#meganno_client.service.Service.get_annotator","title":"get_annotator()
","text":"Get annotator's own name and user ID. The back-end service distinguishes annotator by the token or auth object used to initialize the connection.
"},{"location":"references/service/#meganno_client.service.Service.search","title":"search(limit=DEFAULT_LIST_LIMIT, skip=0, uuid_list=None, keyword=None, regex=None, record_metadata_condition=None, annotator_list=None, label_condition=None, label_metadata_condition=None, verification_condition=None)
","text":"Search the back-end database based on user-provided predicates.
Parameters:
Name Type Description Defaultlimit
The limit of returned records in the subest.
DEFAULT_LIST_LIMIT
skip
skip index of returned subset (excluding the first skip
rows from the raw results ordered by importing order).
0
uuid_list
list of record uuids to filter on
None
keyword
Term for exact keyword searches.
None
regex
Term for regular expression searches.
None
record_metadata_condition
{\"name\": # name of the record-level metadata to filter on \"opeartor\": \"==\"|\"<\"|\">\"|\"<=\"|\">=\"|\"exists\", \"value\": # value to complete the expression}
None
annotator_list
list of annotator names to filter on
None
label_condition
Label condition of the annotation. {\"name\": # name of the label to filter on \"opeartor\": \"==\"|\"<\"|\">\"|\"<=\"|\">=\"|\"exists\"|\"conflicts\", \"value\": # value to complete the expression}
None
label_metadata_condition
Label metadata condition of the annotation. Note this can be on different labels than label_condition {\"label_name\": # name of the associated label \"name\": # name of the label-level metadata to filter on \"operator\": \"==\"|\"<\"|\">\"|\"<=\"|\">=\"|\"exists\", \"value\": # value to complete the expression}
None
verification_condition
verification condition of the annotation. {\"label_name\": # name of the associated label \"search_mode\":\"ALL\"|\"UNVERIFIED\"|\"VERIFIED\"}
None
Returns:
Name Type Descriptionsubset
Subset
Subset meeting the search conditions.
"},{"location":"references/service/#meganno_client.service.Service.deprecate_submit_annotations","title":"deprecate_submit_annotations(subset=None, uuid_list=[])
","text":"Submit annotations for records in a subset to the back-end service database. Results are filtered to only include annotations owned by the authenticated annotator.
Parameters:
Name Type Description Defaultsubset
Subset
The subset object containing records and annotations.
None
uuid_list
list
Additional filter. Only subset records whose uuid are in this list will be submitted.
[]
"},{"location":"references/service/#meganno_client.service.Service.submit_annotations","title":"submit_annotations(subset=None, uuid_list=[])
","text":"Submit annotations for a batch of records in a subset to the back-end service database. Results are filtered to only include annotations owned by the authenticated annotator.
Parameters:
Name Type Description Defaultsubset
Subset
The subset object containing records and annotations.
None
uuid_list
list
Additional filter. Only subset records whose uuid are in this list will be submitted.
[]
"},{"location":"references/service/#meganno_client.service.Service.import_data_url","title":"import_data_url(url='', file_type=None, column_mapping={})
","text":"Import data from a public url, currently only supporting csv files. Each row corresponds to a data record. The file needs at least two columns: one with a unique id for each row, and one with the raw data content.
Parameters:
Name Type Description Defaulturl
str
Public url for csv file
''
file_type
str
Currently only supporting type 'CSV'
None
column_mapping
dict
Dictionary with fields id
specifying id column name, and content
specifying content column name. For example, with a csv file with two columns index
and tweet
:
{\n \"id\": \"index\",\n \"content\": \"tweet\"\n}\n
{}
"},{"location":"references/service/#meganno_client.service.Service.import_data_df","title":"import_data_df(df, column_mapping={})
","text":"Import data from a pandas DataFrame. Each row corresponds to a data record. The dataframe needs at least two columns: one with a unique id for each row, and one with the raw data content.
Parameters:
Name Type Description Defaultdf
DataFrame
Qualifying dataframe
requiredcolumn_mapping
dict
Dictionary with fields id
specifying id column name, and content
specifying content column name. Using a dataframe, users can import metadata at the same time. For example, with a csv file with two columns index
and tweet
, and a column location
:
{\n \"id\": \"index\",\n \"content\": \"tweet\",\n \"metadata\": \"location\"\n}\n
metadata with name location
will be created for all imported data records. {}
"},{"location":"references/service/#meganno_client.service.Service.export","title":"export()
","text":"Exporting function.
Returns:
Name Type Descriptionexport_df
DataFrame
A pandas dataframe with columns 'data_id', 'content', 'annotator', 'label_name', 'label_value'
for all records in the project
set_metadata(meta_name, func, batch_size=500)
","text":"Set metadata for all records in the back-end database, based on user-defined function for metadata calculation.
Parameters:
Name Type Description Defaultmeta_name
str
Name of the metadata. Will be used to identify and query the metadata.
requiredfunc
function(raw_content)
Function which takes input the raw data content and returns the corresponding metadata (int, string, vectors...).
requiredbatch_size
int
Batch size for back-end database updates.
500
Example from sentence_transformers import SentenceTransformer\n\nmodel = SentenceTransformer('all-MiniLM-L6-v2')\n# set metadata generation function for service object demo\ndemo.set_metadata(\"bert-embedding\",\n lambda x: list(model.encode(x).astype(float)), 500)\n
"},{"location":"references/service/#meganno_client.service.Service.get_assignment","title":"get_assignment(annotator=None, latest_only=False)
","text":"Get workload assignment for annotator.
Parameters:
Name Type Description Defaultannotator
str
User ID to query. If set to None, use ID of auth token holder.
None
latest_only
bool
If true, return only the last assignment for the user. Else, return the set of all assigned records.
False
"},{"location":"references/statistic/","title":"Statistic","text":""},{"location":"references/statistic/#meganno_client.statistic.Statistic","title":"meganno_client.statistic.Statistic
","text":"The Statistic class contains methods to show basic statistics of the labeling project. Mostly used to back views in the monitoring dashboard.
Attributes:
Name Type Description__service
Service
Service object for the connected project.
"},{"location":"references/statistic/#meganno_client.statistic.Statistic.get_label_progress","title":"get_label_progress()
","text":"Get the overall progress of annotation.
Returns:
Name Type Descriptionresponse
dict
A dictionary with fields total
showing total number for data records, and annotated
showing number of records with any label from at least one annotator.
get_label_distributions(label_name: str = None)
","text":"Get the class distribution of a selected label. If multiple annotators labeled the same record, aggregate using majority vote
.
Parameters:
Name Type Description Defaultlabel_name
str
Name of label as specified in the schema.
None
Returns:
Name Type Descriptionresponse
dict
A dictionary showing aggregated class frequencies. Example: {'neg': 60, 'neu': 14, 'pos': 27, 'tied_annotations': 3}
. tied_annotation
counts numbers of record when there's more than majority voted classes.
get_annotator_contributions()
","text":"Get contributions of annotators in terms of records labeled.
Returns:
Name Type Descriptionresponse
dict
A dictionary where keys are annotator IDs and values are total numbers of annotated records by each annotator.
"},{"location":"references/statistic/#meganno_client.statistic.Statistic.get_annotator_agreements","title":"get_annotator_agreements(label_name: str = None)
","text":"Get pairwise agreement score between all contributing annotators to the project, on the specified label. The default agreement calculation method is cohen_kappa
.
Parameters:
Name Type Description Defaultlabel_name
str
Name of label as specified in the schema.
None
Returns:
Name Type Descriptionresponse
dict
A dictionary where keys are pairs of annotator IDs, and values are their agreement scores. The higher the scores are, the more frequent the pairs of annotators agree.
"},{"location":"references/statistic/#meganno_client.statistic.Statistic.get_embeddings","title":"get_embeddings(label_name: str = None, embed_type: str = None)
","text":"Return 2-dimensional TSNE projection of the text embedding for data records, together with their aggregated labels (using majority votes). Used for projection view in the monitoring dashboard.
Parameters:
Name Type Description Defaultlabel_name
str
Name of label as specified in the schema.
None
embed_type
str
the meta_name for the specified embedding
None
Returns:
Name Type Descriptionresponse
dict
A dictionary with fields agg_label
showing aggregated class label, x_axis
and y_axis
showing projected 2d coordinates.
meganno_client.subset.Subset
","text":"The Subset class is used to represent a group of data records
Attributes:
Name Type Description__data_uuids
list
List of unique identifiers of data records in the subset.
__service
Service
Connected backend service
__my_annotation_list
list
Local cache of the record and annotation view of the subset owned by service.annotator_id. with all possible metadata.
"},{"location":"references/subset/#meganno_client.subset.Subset.__init__","title":"__init__(service, data_uuids=[], job_id=None)
","text":"Init function
Parameters:
Name Type Description Defaultservice
Service
Service-class object identifying the connected backend service and corresponding data storage
requireddata_uuids
list
List of data uuid's to be included in the subset
[]
"},{"location":"references/subset/#meganno_client.subset.Subset.get_uuid_list","title":"get_uuid_list()
","text":"Get list of unique identifiers for all records in the subset.
Returns:
Name Type Description__data_uuids
list
List of data uuids included in Subset
"},{"location":"references/subset/#meganno_client.subset.Subset.value","title":"value(annotator_list: list = None)
","text":"Check for cached data and annotations of service owner, or retrieve for other annotators (not cached).
Parameters:
Name Type Description Defaultannotator_list
list
if None, retrieve cached own annotator. else, fetch live annotation from others.
None
Returns:
Name Type Descriptionsubset_annotation_list
list
See __get_annotation_list
for description and example.
get_annotation_by_uuid(uuid)
","text":"Return the annotation for a particular data record (specified by uuid)
Parameters:
Name Type Description Defaultuuid
str
the uuid for the data record specified by user
requiredReturns:
Name Type Descriptionannotation
dict
Annotation for specified data record if it exists else None
"},{"location":"references/subset/#meganno_client.subset.Subset.show","title":"show(config={})
","text":"Visualize the current subset in an in-notebook annotation widget.
Development note: initializing an Annotation widget, creating unique reference to the associated subset and service.
Parameters:
Name Type Description Defaultconfig
dict
Configuration for default view of the widget.
- view : \"single\" | \"table\", default \"single\"\n- mode : \"annotating\" | \"reconciling\", default \"annotating\"\n- title: default \"Annotation\"\n- height: default 300 (pixels)\n
{}
"},{"location":"references/subset/#meganno_client.subset.Subset.set_annotations","title":"set_annotations(uuid=None, labels=None)
","text":"Set the annotation for a particular data record with the specified label
Parameters:
Name Type Description Defaultuuid
str
the uuid for the data record specified by user
None
labels
dict
The labels for the data record at record and span level, with the following structure:
- \"labels_record\" : list\n A list of record-level labels\n- \"labels_span\" : list\n A list of span-level labels\n\nExamples\n-------\n\nExample of setting an annotation with the desired record and span level labels:\n```json\n{\n \"labels_record\": [\n {\n \"label_name\": \"sentiment\",\n \"label_value\": [\"neu\"]\n }\n ],\n\n \"labels_span\": [\n {\n \"label_name\": \"sentiment\",\n \"label_value\": [\"neu\"],\n \"start_idx\": 10,\n \"end_idx\": 20\n }\n ]\n}\n```\n
None
Raises:
Type DescriptionException
If uuid or labels is None
Returns:
Name Type Descriptionlabels
dict
Updated labels for uuid annotated by user
"},{"location":"references/subset/#meganno_client.subset.Subset.get_reconciliation_data","title":"get_reconciliation_data(uuid_list=None)
","text":"Return the list of reconciliation data for all data entries specified by user. The reconciliation data for one data record consists of the annotations for it by all annotators
Parameters:
Name Type Description Defaultuuid_list
list
list of uuid's provided by user. If None, use all records in the subset
None
Returns:
Name Type Descriptionreconciliation_data_list
list
List of reconciliation data for each uuid with the following keys: annotation_list
which specifies all the annotations for the uuid, data
which contains the raw data specified by the uuid, metadata
which stores additional information about the data, tokens
, and the uuid
of the data record Full Example:
{\n \"annotation_list\": [\n {\n \"annotator\": \"pwOA1N9RKZVJM8VZZ7w8VcT8lp22\",\n \"labels_record\": [],\n \"labels_span\": []\n },\n {\n \"annotator\": \"IAzgHOxyeLQBi5QVo7dQR0p2DpA2\",\n \"labels_record\": [\n {\n \"label_name\": \"sentiment\",\n \"label_value\": [\"pos\"]\n }\n ],\n \"labels_span\": []\n }\n ],\n \"data\": \"@united obviously\",\n \"metadata\": [],\n \"tokens\": [],\n \"uuid\": \"ee408271-df5d-435c-af25-72df58a21bfe\"\n}\n
"},{"location":"references/subset/#meganno_client.subset.Subset.suggest_similar","title":"suggest_similar(record_meta_name, limit=3)
","text":"For each data record in the subset, suggest more similar data records by retriving the most similar data records from the pool, based on metadata(e.g., embedding) distance.
Parameters:
Name Type Description Defaultrecord_meta_name
str
The meta-name eg. \"bert-embedding\" for which the similarity is calculated upon.
requiredlimit
int
The number of matching/similar records desired to be returned. Default is 3
3
Raises:
Type DescriptionException
If response code is not successful
Returns:
Name Type Descriptionsubset
Subset
A subset of similar data entries
"},{"location":"references/subset/#meganno_client.subset.Subset.assign","title":"assign(annotator)
","text":"Assign the current subset as payload to an annotator.
Parameters:
Name Type Description Defaultannotator
str
Annotator ID.
required"}]} \ No newline at end of file +{"config":{"lang":["en"],"separator":"[\\s\\-]+","pipeline":["stopWordFilter"]},"docs":[{"location":"","title":"Welcome to MEGAnno documentation","text":""},{"location":"#how-to-get-started","title":"How to get started?","text":"There are 2 ways to get started with MEGAnno:
1. Demo system access: We prepared a Google Colab notebook for this demo. To run the Colab notebook, you\u2019ll need a Google account, an OpenAI API key, and a MEGAnno access token (you can get this by filling out the request form).
2. Your own MEGAnno environment: To set up MEGAnno for your own projects, you can set up your own self-hosted MEGAnno service. Please follow the self-hosted installation instructions.
"},{"location":"#what-is-meganno","title":"What is MEGAnno?","text":"Many existing data annotation tools focus on the annotator enabling them to annotate data and manage annotation activities. Instead, MEGAnno is an open-source data annotation tool that puts the data scientist first, enabling you to bootstrap annotation tasks and manage the continual evolution of annotations through the machine learning lifecycle.
In addition, MEGAnno\u2019s unique capabilities include:
A back-end service that acts as a single source of truth and stores/manages all the evolution of annotation information through the lifecycle.
Power tools to explore data sets and select the best data to label. Accommodations for active learning and other techniques to prioritize your labeling work.
Explore the distribution of labels and the behavior of annotators to make decisions for subsequent labeling batches.
A data scientist-focused experience enabling you to manage annotation directly in your notebooks. This allows you to utilize existing Python functions and our built-in power tools to optimize your annotation process.
Figure 1. MEGAnno's unique capabilities
"},{"location":"#system-overview","title":"System Overview","text":"MEGAnno provides two key components: (1) a Python client library featuring interactive widgets and (2) a back-end service consisting of web API and database servers. To use our system, a user can interact with a Jupyter Notebook that has the MEGAnno client installed. Through programmatic interfaces and UI widgets, the client communicates with the service. Figure 2. Overview of MEGAnno+ system.
Please see the Getting Started page for setup instructions and the Advanced Features page for more cool features we provide.
"},{"location":"#references","title":"References","text":"@inproceedings{kim-etal-2024-meganno,\n title = \"{MEGA}nno+: A Human-{LLM} Collaborative Annotation System\",\n author = \"Kim, Hannah and Mitra, Kushan and Li Chen, Rafael and Rahman, Sajjadur and Zhang, Dan\",\n editor = \"Aletras, Nikolaos and De Clercq, Orphee\",\n booktitle = \"Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations\",\n month = mar,\n year = \"2024\",\n address = \"St. Julians, Malta\",\n publisher = \"Association for Computational Linguistics\",\n url = \"https://aclanthology.org/2024.eacl-demo.18\",\n pages = \"168--176\",\n}\n
@inproceedings{zhang-etal-2022-meganno,\n title = \"{MEGA}nno: Exploratory Labeling for {NLP} in Computational Notebooks\",\n author = \"Zhang, Dan and Kim, Hannah and Li Chen, Rafael and Kandogan, Eser and Hruschka, Estevam\",\n editor = \"Dragut, Eduard and Li, Yunyao and Popa, Lucian and Vucetic, Slobodan and Srivastava, Shashank\",\n booktitle = \"Proceedings of the Fourth Workshop on Data Science with Human-in-the-Loop (Language Advances)\",\n month = dec,\n year = \"2022\",\n address = \"Abu Dhabi, United Arab Emirates (Hybrid)\",\n publisher = \"Association for Computational Linguistics\",\n url = \"https://aclanthology.org/2022.dash-1.1\",\n pages = \"1--7\",\n}\n
"},{"location":"advanced/","title":"Advanced features","text":"This notebook provides examples of some of the advanced features.
"},{"location":"advanced/#updating-schema","title":"Updating Schema","text":"Annotation requirements can change as projects evolve. To update the schema for a project, simply call set_schemas
with the new schema object. For example, to expand the schema we set in the basic notebook:
demo.get_schemas().set_schemas({\n \"label_schema\": [\n {\n \"name\": \"sentiment\",\n \"level\": \"record\", \n \"options\": [\n { \"value\": \"pos\", \"text\": \"positive\" },\n { \"value\": \"neg\", \"text\": \"negative\" },\n { \"value\": \"neu\", \"text\": \"neutral\" } # adding a new option\n ]\n },\n # adding a span-level label\n {\n \"name\": \"sp\",\n \"level\": \"span\", \n \"options\": [\n { \"value\": \"pos\", \"text\": \"positive\" },\n { \"value\": \"neg\", \"text\": \"negative\" },\n ]\n }\n ]\n})\n
Only the latest schema will be active, but all previous ones will be preserved. To see the full history: demo.get_schemas().get_history()\n
"},{"location":"advanced/#metadata","title":"Metadata","text":"In MEGAnno, metadata refers to auxiliary information associated with data records. MEGAnno takes user-defined functions to generate metadata and uses it to find important subsets and assist human annotators. Here we show two examples.
Example 1: Adding sentence bert embeddings for data records. The embeddings can later be used to make similarity computations over records.
# Example 1, adding sentence-bert embedding.\nfrom sentence_transformers import SentenceTransformer\nmodel = SentenceTransformer(\"all-MiniLM-L6-v2\")\n# set metadata generation function \ndemo.set_metadata(\"bert-embedding\",lambda x: list(model.encode(x).astype(float)), 500)\n
Example 2: Extracting hashtags as annotation context.
# user defined function to extract hashtag\ndef extract_hashtags(text):\n hashtag_list = []\n for word in text.split():\n if word[0] == \"#\":\n hashtag_list.append(word[:])\n # widget can render markdown text\n return \"\".join([\"- {}\\n\".format(x) for x in hashtag_list])\n\n# apply metadata to the project\ndemo.set_metadata(\"hashtag\", lambda x: extract_hashtags(x), 500)\n
With hashtag
metadata, MEGAnno widget can show it as context at annotation time.
s1= demo.search(keyword=\"\", limit=50, skip=0, meta_names=[\"hashtag\"])\ns1.show()\n
"},{"location":"advanced/#advanced-subset-generation","title":"Advanced Subset Generation","text":"In addition to exact keyword matches, MEGAnno also provides more advanced approaches of generating subsets.
"},{"location":"advanced/#regex-based-searches","title":"Regex-based Searches","text":"MEGAnno supports searches based on regular expressions:
s2_reg= demo.search(regex=\".* (delay) .*\", limit=50, skip=0)\ns2_reg.show({\"view\": \"table\"})\n
"},{"location":"advanced/#subset-suggestion","title":"Subset Suggestion","text":"Searches initiated by users can help them explore the dataset in a controlled way. Still, the quality of searches is only as good as users\u2019 knowledge about the data and domain. MEGAnno provides an automated subset suggestion engine to assist with exploration. Embedding-based suggestions make suggestions based on data-embedding vectors provided by the user (as metadata).
For example, suggest_similar suggests neighbors (based on distance in the embedding space) of data in the querying subset:
s3 = demo.search(keyword=\"delay\", limit=3, skip=0) # source subset\ns4 = s3.suggest_similar(\"bert-embedding\", limit=4) # needs to provide a valid meta_name\ns4.show()\n
"},{"location":"advanced/#subset-operations","title":"Subset Operations","text":"MEGAnno supports set operations to build more subsets from others:
# intersection\ns_intersection = s1 & s2 # or s1.intersection(s2)\n# union\ns_union = s1 | s2 # or s1.union(s2)\n# difference\ns_diff = s1 - s2 # or s1.difference(s2)\n
"},{"location":"advanced/#dashboard-administrator-only","title":"Dashboard (administrator-only)","text":"MEGAnno provides a built-in visual monitoring dashboard to help users to get real-time status of the annotation project. As projects evolve, users would often need to understand the project\u2019s status to make decisions about the next steps, like collecting more data points with certain characteristics or adding a new class to the task definition. To aid such analysis, the dashboard widget packs common statistics and analytical visualizations (e.g., annotation progress, distribution of labels, annotator agreement, etc.) based on a survey of our pilot users.
To bring up the project dashboard:
demo.show()\n
Other features
Assignment and dispatch: You may assign a subset to a particular annotator
s1.assign(annotator_id)\n
Multiple annotators and reconciliation: You are also able to view a reconciled list of annotations from multiple annotators
s1.get_reconciliation_data()\n
Please also refer to this notebook for a running example of the basic pipeline of using MEGAnno in a notebook.
"},{"location":"basic/#setting-schema","title":"Setting Schema","text":"Schema defines the annotation task. Example of setting schema for a sentiment analysis task with positive and negative options.
demo.get_schemas().set_schemas({\n \"label_schema\": [\n {\n \"name\": \"sentiment\",\n \"level\": \"record\", \n \"options\": [\n { \"value\": \"pos\", \"text\": \"positive\" },\n { \"value\": \"neg\", \"text\": \"negative\" },\n ]\n }\n ]\n})\ndemo.get_schemas().value(active=True) \n
A label can be defined to have level record
or span
. Record-level labels correspond to the entire data record, while span-level labels are associated with a text span in the record. See Updating Schema for an example of a more complex schema."},{"location":"basic/#importing-data","title":"Importing Data","text":"Given a pandas dataframe like this (example generated from this Twitter US Airline Sentiment dataset):
id tweet 0 @united how else would I know it was denied? 1 @JetBlue my SIL bought tix for us to NYC. We were told at the gate that her cc was declined. Supervisor accused us of illegal activity. 2 @JetBlue dispatcher keeps yelling and hung up on me!Importing data is easy by providing column names for id
which is a unique importing identifier for data records, and content
which is the raw text field.
demo.import_data_df(df, column_mapping={\n \"id\": \"id\",\n \"content\": \"tweet\"\n})\n
"},{"location":"basic/#exploratory-labeling","title":"Exploratory Labeling","text":"Not all data points are equally important for downstream models and applications. There are often cases where users might want to prioritize a particular batch (e.g., to achieve better class or domain coverage or focus on the data points that the downstream model cannot predict well). MEGAnno provides a flexible and controllable way of organizing annotation projects through the exploratory labeling. This annotation process is done by first identifying an interesting subset and assigning labels to data in the subset. We provide a set of \u201cpower tools\u201d to help identify valuable subsets.
The script below shows an example of searching for data records with keyword \"delay\" and bringing up a widget for annotation in the next cell. More examples here.
# search results => subset s1\ns1 = demo.search(keyword=\"delay\", limit=10, skip=0)\n# bring up a widget \ns1.show()\n
"},{"location":"basic/#column-filters","title":"Column Filters","text":"To view all column filters, click on \"Filters\" button; to reset all column filters, click on \"Reset filters\" button.
"},{"location":"basic/#column-order-visibility","title":"Column Order & Visibility","text":"1. To re-order and re-size column, mouse over column drag handler (left grip handler for re-order and right column edge for re-size). 2. To toggle column visiblity, click on \"Columns\", then toggle column to show/hide. 3. To reset column ordering and visibility, click on \"Reset columns\" button.
"},{"location":"basic/#metadata-focus-view","title":"Metadata Focus-view","text":"To focus on a single metadata value, click on \"Settings\" button, then choose a metadata name from the list.
"},{"location":"basic/#exporting","title":"Exporting","text":"Although iterations can happen within a single notebook, it's easy to export the data, and annotations collected:
# collecting the annotation generated by all annotators\ndemo.export()\n
"},{"location":"llm_integration/","title":"LLM Integration","text":"This notebook provides an example workflow of utilizing LLMs as annotation agents within MEGAnno.
Figure 1. Human-LLM collaborative workflow.
MEGAnno offers a simple human-LLM collaborative annotation workflow: LLM annotation followed by human verification. Put simply, LLM agents label data first (Figure 1, step \u2460), and humans verify LLM labels as needed. For most tasks and datasets one can use LLM labels as is; for some subset of difficult or uncertain instances (Figure 1, step \u2461), humans can verify LLM labels \u2013 confirm the right ones and correct the wrong ones (Figure 1, step \u2462). In this way, the LLM annotation part can be automated, and human efforts can be directed to where they are most needed to improve the quality of final labels.
An overview of the entire system and key concepts are shown below.
Figure 2. Overview of MEGAnno+ system.
Subset: refers to a slice of data created from user-defined searches.
Record: refers to an item within the data corpus.
Agent: an Agent is defined by the configuration of the LLM (e.g., model\u2019s name, version, and hyper-parameters) and a prompt template.
Job: when an Agent is employed to annotate a selected data Subset, the execution is referred to as a Job.
Label: stores the label assigned to a particular Record
Label_Metadata: captures additional aspects of a label, such as LLM confidence score or length of label response, etc.
Verification: captures annotations from human users that confirm or update LLM labels
"},{"location":"llm_integration/#llm-annotation","title":"LLM Annotation","text":"MEGAnno achieves LLM annotation in three steps, as shown in the figure below.
Figure 3. Steps in the LLM annotation workflow.
The preprocessing step handles the generation of prompts and validation of model configuration. Users can specify a particular LLM model, define its configurations and customize a prompt template (Figure 4). This defines an Agent which can be used for the annotation task. Registered Agents can be reused later.
Figure 4. Prompt Template UI. Users can customize task instructions and preview generated prompts.
After the selected model configuration is validated, the next step is calling the LLM. MEGAnno handles the call to the external LLM API to obtain LLM responses. Any API errors encountered during the call are also appropriately handled and a suitable message is relayed to the user.
Once the responses are obtained, the post-processing step extracts the label from the LLM response. Our post-processing step ensures some minor deviations in the LLM's response (such as trailing period) are handled. Furthermore, users can set fuzzy_extraction=True
which performs a fuzzy match between the LLM response and the label schema space, and if a significant match is found the corresponding label is attributed for the task. The figure below shows how MEGAnno's post-processing mechanism handles several LLM responses.
Figure 5. Example LLM responses and post-processing results by MEGAnno.
"},{"location":"llm_integration/#verification-subset-selection","title":"Verification Subset Selection","text":"It would be redundant for a human to verify every annotation in the dataset as that would defeat the purpose of using LLMs for a cheap and faster annotation process. Instead, MEGAnno provides a possibility to aid the human verifiers by computing confidence scores for each annotation. Users can specify confidence_score
of the LLM labels to be computed and stored. They can then view the confidence scores, and even sort as well as filter over them to obtain only those annotations for which the LLM had low confidence scores. This will ease the human verification process and make it more efficient.
Users can then use MEGAnno's in-notebook widget to verify LLM labels i.e., either confirm a label as correct or reject the label and specify a correct label. Users may view the final annotations and export the data for downstream tasks or further analysis.
Figure 6. Verification UI for exploring data and confirming/correcting LLM labels.
"},{"location":"quickstart/","title":"Getting Started","text":""},{"location":"quickstart/#installation","title":"Installation","text":"We have 2 ways to authenticate with the service:
Short-term 1 hour access with username and password sign in.
After executing auth = Authentication(project=\"<project_name>\")
(this only works for notebook and terminal running on local computer), you will be provided with a sign in interface via a new browser tab.
After signing in, you will be able to generate a long-term personal access token by running auth.create_access_token(expiration_duration=7, note=\"testing\")
expiration_duration
is in days.expiration_duration
to 0 (under the hood, it still expires after 100 years).Long-term access with access token without signing in every time.
auth = Authentication(project=\"<project_name>\", token=\"<your_token>\")\n
MEGAnno supports 2 types of user roles: Admin and Contributor. Admin users are project owners deploying the services; they have full access to the project such as importing data or updating schemas. Admin users can invite contributors by sharing invitation code(s) with them. Contributors can only access their own annotation namespace and cannot modify the project.
To invite contributors, follow the instructions below:
from meganno_client import Admin\ntoken = \"...\"\nauth = Authentication(project=\"<project_name>\", token=token)\n\nadmin = Admin(project=\"eacl_demo\", auth=auth)\n# OR\nadmin = Admin(project=\"eacl_demo\", token=token)\n
admin.create_invitation(single_use=True, code=\"<invitation_code>\", role_code=\"contributor\")\n
admin.get_invitations()\nadmin.renew_invitation(id=\"<invitation_code_id>\")\nadmin.revoke_invitation(id=\"<invitation_code_id>\")\n
auth = Authentication(project=\"<project_name>\")
, a new browser tab will present itself.GET
POST
/agents administrator
contributor
GET
/agents/jobs /agents/<string:agent_uuid>/jobs GET
POST
/agents/<string:agent_uuid>/jobs/<string:job_uuid> /annotations/<string:record_uuid> administrator
contributor
job
POST
/annotations/batch /annotations/<string:record_uuid>/labels administrator
contributor
/annotations/label_metadata administrator
contributor
job
GET
POST
/assignments administrator
contributor
POST
/data /data/metadata administrator
GET
/data/export /data/suggest_similar administrator
contributor
GET
/schemas administrator
contributor
job
POST
administrator
POST
/verifications/<string:record_uuid>/labels administrator
contributor
GET
/annotations /view/record /view/annotation /view/verifications administrator
contributor
job
/reconciliations administrator
contributor
GET
/statistics/annotator/contributions /statistics/annotator/agreements /statistics/embeddings/<embed_type> /statistics/label/progress /statistics/label/distributions administrator
GET
POST
PUT
DELETE
/invitations administrator
GET
/invitations/<invitation_code> GET
POST
DELETE
/tokens administrator
contributor
"},{"location":"references/controller/","title":"Controller","text":""},{"location":"references/controller/#meganno_client.controller.Controller","title":"meganno_client.controller.Controller
","text":"The Controller class manages annotation agents and runs agent jobs.
"},{"location":"references/controller/#meganno_client.controller.Controller.__init__","title":"__init__(service, auth)
","text":"Init function
Parameters:
Name Type Description Defaultservice
Service
MEGAnno service object for the connected project.
requiredauth
Authentication
MEGAnno authentication object.
required"},{"location":"references/controller/#meganno_client.controller.Controller.list_agents","title":"list_agents(created_by_filter=None, provider_filter=None, api_filter=None, show_job_list=False)
","text":"Get the list of registered agents by their issuer IDs.
Parameters:
Name Type Description Defaultcreated_by_filter
list
List of user IDs to filter agents, by default None (if None, list all)
None
provider_filter
Returns agents with the specified provider eg. openai
None
api_filter
Returns agents with the specified api eg. completion
None
show_job_list
if True, also return the list uuids of jobs of the agent.
False
Returns:
Type Descriptionlist
A list of agents that are created by specified issuers.
"},{"location":"references/controller/#meganno_client.controller.Controller.list_jobs","title":"list_jobs(filter_by, filter_values, show_agent_details=False)
","text":"Get the list of jobs with querying filters.
Parameters:
Name Type Description Defaultfilter_by
str
Filter options. Must be [\"agent_uuid\" | \"issued_by\" | \"uuid\"] | None
requiredfilter_values
list
List of uuids of entity specified in 'filter_by'
requiredshow_agent_details
bool
If True, return agent configuration, by default False
False
Returns:
Type Descriptionlist
A list of jobs that match given filtering criteria.
"},{"location":"references/controller/#meganno_client.controller.Controller.list_jobs_of_agent","title":"list_jobs_of_agent(agent_uuid, show_agent_details=False)
","text":"Get the list of jobs of a given agent.
Parameters:
Name Type Description Defaultagent_uuid
str
Agent uuid
requiredshow_agent_details
bool
If True, return agent configuration, by default False
False
Returns:
Type Descriptionlist
A list of jobs of a given agent
"},{"location":"references/controller/#meganno_client.controller.Controller.register_agent","title":"register_agent(model_config, prompt_template_str, provider_api)
","text":"Register an agent with backend service.
Parameters:
Name Type Description Defaultmodel_config
dict
Model configuration object
requiredprompt_template_str
str
Serialized prompt template
requiredprovider_api
str
Name of provider and corresponding api eg. 'openai:chat'
requiredReturns:
Type Descriptiondict
object with unique agent id.
"},{"location":"references/controller/#meganno_client.controller.Controller.persist_job","title":"persist_job(agent_uuid, job_uuid, label_name, annotation_uuid_list)
","text":"Given annoations for a subset, persist them as a job for the project.
Parameters:
Name Type Description Defaultagent_uuid
str
Agent uuid
requiredjob_uuid
str
Job uuid
requiredlabel_name
str
Label name used for annotation
requiredannotation_uuid_list
list
List of uuids of records that have valid annotations from the job
requiredReturns:
Type Descriptiondict
Object with job uuid and annotation count
"},{"location":"references/controller/#meganno_client.controller.Controller.create_agent","title":"create_agent(model_config, prompt_template, provider_api='openai:chat')
","text":"Validate model configs and register a new agent. Return new agent's uuid.
Parameters:
Name Type Description Defaultmodel_config
dict
Model configuration object
requiredprompt_template
str
PromptTemplate object
requiredprovider_api
str
Name of provider and corresponding api eg. 'openai:chat'
'openai:chat'
Returns:
Name Type Descriptionagent_uuid
str
Agent uuid
"},{"location":"references/controller/#meganno_client.controller.Controller.get_agent_by_uuid","title":"get_agent_by_uuid(agent_uuid)
","text":"Return agent model configuration, prompt template, and creator id of specified agent.
Parameters:
Name Type Description Defaultagent_uuid
str
Agent uuid
requiredReturns:
Type Descriptiondict
A dict containing agent details.
"},{"location":"references/controller/#meganno_client.controller.Controller.list_my_agents","title":"list_my_agents()
","text":"Get the list of registered agents by me.
Returns:
Name Type Descriptionagents
list
A list of agents that are created by me.
"},{"location":"references/controller/#meganno_client.controller.Controller.list_my_jobs","title":"list_my_jobs(show_agent_details=False)
","text":"Get the list of jobs of issued by me.
Parameters:
Name Type Description Defaultshow_agent_details
bool
If True, return agent configuration, by default False
False
Returns:
Name Type Descriptionjobs
list
A list of jobs of issued by me.
"},{"location":"references/controller/#meganno_client.controller.Controller.run_job","title":"run_job(agent_uuid, subset, label_name, batch_size=1, num_retrials=2, label_meta_names=[], fuzzy_extraction=False)
","text":"Create, run, and persist an LLM annotation job with given agent and subset.
Parameters:
Name Type Description Defaultagent_uuid
str
Uuid of an agent to be used for the job
requiredsubset
Subset
[Megagon-only] MEGAnno Subset object to be annotated in the job
requiredlabel_name
str
Label name used for annotation
requiredbatch_size
int
Size of batch to each Open AI prompt
1
num_retrials
int
Number of retrials to OpenAI in case of failure in response
2
label_meta_names
list of label metadata names to be set
[]
fuzzy_extraction
Set to True if fuzzy extraction desired in post processing
False
Returns:
Name Type Descriptionjob_uuid
str
Job uuid
"},{"location":"references/openai_job/","title":"OpenAIJob","text":""},{"location":"references/openai_job/#meganno_client.llm_jobs.OpenAIJob","title":"meganno_client.llm_jobs.OpenAIJob
","text":"The OpenAIJob class handles calls to OpenAI APIs.
"},{"location":"references/openai_job/#meganno_client.llm_jobs.OpenAIJob.__init__","title":"__init__(label_schema={}, label_names=[], records=[], model_config={}, prompt_template=None)
","text":"Init function
Parameters:
Name Type Description Defaultlabel_schema
list
List of label objects
{}
label_names
list
List of label names to be used for annotation
[]
records
list
List of records in [{'data': , 'uuid': }] format
[]
model_config
dict
Parameters for the Open AI model
{}
prompt_template
str
Template based on which prompt to OpenAI is prepared for each record
None
"},{"location":"references/openai_job/#meganno_client.llm_jobs.OpenAIJob.set_openai_api_key","title":"set_openai_api_key(openai_api_key, openai_organization)
","text":"Set the API keys necessary for call to OpenAI API
Parameters:
Name Type Description Defaultopenai_api_key
str
OpenAI API key provided by user
requiredopenai_organization
str[optional]
OpenAI organization key provided by user
required"},{"location":"references/openai_job/#meganno_client.llm_jobs.OpenAIJob.validate_openai_api_key","title":"validate_openai_api_key(openai_api_key, openai_organization)
staticmethod
","text":"Validate the OpenAI API and organization keys provided by user
Parameters:
Name Type Description Defaultopenai_api_key
str
OpenAI API key provided by user
requiredopenai_organization
str[optional]
OpenAI organization key provided by user
requiredRaises:
Type DescriptionException
If api keys provided by user are invalid, or if any error in calling OpenAI API
Returns:
Name Type Descriptionopenai_api_key
str
OpenAI API key
openai_organization
str
OpenAI Organization key
"},{"location":"references/openai_job/#meganno_client.llm_jobs.OpenAIJob.validate_model_config","title":"validate_model_config(model_config, api_name='chat')
staticmethod
","text":"Validate the LLM model config provided by user. Model should be among the models allowed on MEGAnno, and the parameters should match format specified by Open AI
Parameters:
Name Type Description Defaultmodel_config
dict
Model specifications such as model name, other parameters eg. temperature, as provided by user
requiredapi_name
str
Name of OpenAI api eg. \"chat\" or \"completion
'chat'
Raises:
Type DescriptionException
If model is not among the ones provided by MEGAnno, or if configuration format is incorrect
Returns:
Name Type Descriptionmodel_config
dict
Model congigurations
"},{"location":"references/openai_job/#meganno_client.llm_jobs.OpenAIJob.is_valid_prompt","title":"is_valid_prompt(prompt)
","text":"Validate the prompt generated. It should not exceed the maximum token limit specified by OpenAI. We use the approximation 1 word ~ 1.33 tokens
Parameters:
Name Type Description Defaultprompt
str
Prompt generated for OpenAI based on template and the record data
requiredReturns:
Type Descriptionbool
True if prompt is valid, False otherwise
"},{"location":"references/openai_job/#meganno_client.llm_jobs.OpenAIJob.generate_prompts","title":"generate_prompts()
","text":"Helper function. Given a prompt template and a list of records, generate a list of prompts for each record
Returns:
Name Type Descriptionprompts
list
List of tuples of (uuid, generated prompt) for each record in given subset
"},{"location":"references/openai_job/#meganno_client.llm_jobs.OpenAIJob.get_response_length","title":"get_response_length()
","text":"Return the length of the openai response
"},{"location":"references/openai_job/#meganno_client.llm_jobs.OpenAIJob.get_openai_conf_score","title":"get_openai_conf_score()
","text":"Return confidence score of the label, calculated using average of logit scores
"},{"location":"references/openai_job/#meganno_client.llm_jobs.OpenAIJob.preprocess","title":"preprocess()
","text":"Generate the list of prompts for each record based on the subset and template
Returns:
Name Type Descriptionprompts
list
List of prompts
"},{"location":"references/openai_job/#meganno_client.llm_jobs.OpenAIJob.get_llm_annotations","title":"get_llm_annotations(batch_size=1, num_retrials=2, api_name='chat', label_meta_names=[])
","text":"Call OpenAI using the generated prompts, to obtain valid & invalid responses
Parameters:
Name Type Description Defaultbatch_size
int
Size of batch to each Open AI prompt
1
num_retrials
int
Number of retrials to OpenAI in case of failure in response
2
api_name
str
Name of OpenAI api eg. \"chat\" or \"completion
'chat'
label_meta_names
list of label metadata names to be set
[]
Returns:
Name Type Descriptionresponses
list
List of valid responses from OpenAI
invalid_responses
list
List of invalid responses from OpenAI
"},{"location":"references/openai_job/#meganno_client.llm_jobs.OpenAIJob.extract","title":"extract(uuid, response, fuzzy_extraction)
","text":"Helper function for post-processing. Extract the label (name and value) from the OpenAI response
Parameters:
Name Type Description Defaultuuid
str
Record uuid
requiredresponse
str
Output from OpenAI
requiredfuzzy_extraction
Set to True if fuzzy extraction desired in post processing
requiredReturns:
Name Type Descriptionret
dict
Returns the label name and label value
"},{"location":"references/openai_job/#meganno_client.llm_jobs.OpenAIJob.post_process_annotations","title":"post_process_annotations(fuzzy_extraction=False)
","text":"Perform output extraction from the responses generated by LLM, and formats it according to MEGAnno data model.
Parameters:
Name Type Description Defaultfuzzy_extraction
Set to True if fuzzy extraction desired in post processing
False
Returns:
Name Type Descriptionannotations
list
List of annotations (uuid, label) in format required by MEGAnno
"},{"location":"references/prompt/","title":"PromptTemplate","text":""},{"location":"references/prompt/#meganno_client.prompt.PromptTemplate","title":"meganno_client.prompt.PromptTemplate
","text":"The PromptTemplate class represents a prompt template for LLM annotation.
"},{"location":"references/prompt/#meganno_client.prompt.PromptTemplate.__init__","title":"__init__(label_schema, label_names=[], template='', **kwargs)
","text":"Init function
Parameters:
Name Type Description Defaultlabel_schema
list
List of label objects
requiredlabel_names
list
List of label names to be used for annotation, by default []
[]
template
str
Stringified template with input slot, by default ''
''
"},{"location":"references/prompt/#meganno_client.prompt.PromptTemplate.set_schema","title":"set_schema(label_schema, label_names)
","text":"A helper function to set schema to be used in prompt template.
Parameters:
Name Type Description Defaultlabel_schema
[]
List of label objects
requiredlabel_names
[]
List of label names to be used for annotation, by default all labels
required"},{"location":"references/prompt/#meganno_client.prompt.PromptTemplate.set_instruction","title":"set_instruction(**kwargs)
","text":"Update template's task instruction and/or formatting instruction.
"},{"location":"references/prompt/#meganno_client.prompt.PromptTemplate.build_template","title":"build_template(task_inst, format_inst, f=lambda x: x)
","text":"A helper function to build template. Return a stringified prompt template with input slot.
Parameters:
Name Type Description Defaulttask_inst
str
Task instruction template. Must include '{name}' and '{options}'.
requiredformat_inst
str
Formatting instruction template. Must include '{format_sample}'.
requiredf
function
Use color() to decorate string for print, by default lambda x:x
lambda x: x
Returns:
Name Type Descriptiontemplate
str
Stringified prompt template with input slot
"},{"location":"references/prompt/#meganno_client.prompt.PromptTemplate.set_template","title":"set_template(**kwargs)
","text":"Update template by updating task instruction and/or formatting instruction.
"},{"location":"references/prompt/#meganno_client.prompt.PromptTemplate.get_template","title":"get_template()
","text":"Return the stringified prompt template with input slot.
Returns:
Type Descriptionstring
Stringified prompt template with input slot
"},{"location":"references/prompt/#meganno_client.prompt.PromptTemplate.get_prompt","title":"get_prompt(input_str: str, **kwargs)
","text":"Return the prompt for a given input.
Parameters:
Name Type Description Defaultinput_str
str
input string to fill input slot
requiredReturns:
Name Type Descriptionprompt
str
a prompt template built with given input string
"},{"location":"references/prompt/#meganno_client.prompt.PromptTemplate.preview","title":"preview(records=[])
","text":"Open up a widget to modify prompt template and preview final prompt.
Parameters:
Name Type Description Defaultrecords
list
List of input objects to be used for prompt preview
[]
"},{"location":"references/schema/","title":"Schema","text":""},{"location":"references/schema/#meganno_client.schema.Schema","title":"meganno_client.schema.Schema
","text":"The Schema class defines an annotation schema for a project.
Attributes:
Name Type Description__service
object
Service object for the connected project.
"},{"location":"references/schema/#meganno_client.schema.Schema.set_schemas","title":"set_schemas(schemas=None)
","text":"Set a user-defined schema
Parameters:
Name Type Description Defaultschemas
dict
Schema of annotation task which defines a label_schema
which is a list of Python dictionaries defining the name
of the label, the level
of the label and options
which defines a list of valid label options
Full Example:
{\n \"label_schema\": [\n {\n \"name\": \"sentiment\",\n \"level\": \"record\",\n \"options\": [\n {\n \"value\": \"pos\",\n \"text\": \"positive\"\n },\n {\n \"value\": \"neg\",\n \"text\": \"negative\"\n }\n ]\n },\n\n ]\n}\n
None
Raises:
Type DescriptionException
If response code is not successful
Returns:
Name Type Descriptionresponse
json
A json of the response
"},{"location":"references/schema/#meganno_client.schema.Schema.value","title":"value(active=None)
","text":"Get project schema
Parameters:
Name Type Description Defaultactive
bool
If True
, only retrieve the active(latest) schema; if False
, retrieve all previous schema; if None
, retrieve full history.
None
"},{"location":"references/schema/#meganno_client.schema.Schema.get_active_schemas","title":"get_active_schemas()
","text":"Get the active schema for the project.
"},{"location":"references/schema/#meganno_client.schema.Schema.get_history","title":"get_history()
","text":"Get the full history of project schemas
"},{"location":"references/service/","title":"Service","text":""},{"location":"references/service/#meganno_client.service.Service","title":"meganno_client.service.Service
","text":"Service objects communicate to back-end MEGAnno services and establish connections to a MEGAnno project.
"},{"location":"references/service/#meganno_client.service.Service.__init__","title":"__init__(host=None, project=None, token=None, auth=None, port=5000)
","text":"Init function
Parameters:
Name Type Description Defaulthost
str
Host IP address for the back-end service to connect to. If None, connects to a Megagon-hosted service.
None
project
str
Project name. The name needs to be unique within the host domain.
None
token
str
User's authentication token.
None
auth
Authentication
Authentication object. Can be skipped if a valid token is provided.
None
"},{"location":"references/service/#meganno_client.service.Service.show","title":"show(config={})
","text":"Show project management dashboard in a floating dashboard.
"},{"location":"references/service/#meganno_client.service.Service.get_service_endpoint","title":"get_service_endpoint(key=None)
","text":"Get REST endpoint for the connected project. Endpoints are composed from base project url and routes for specific requests.
Parameters:
Name Type Description Defaultkey
str
Name of the specific request. Mapping to routes is stored in a dictionary SERVICE_ENDPOINTS
in constants.py
.
None
"},{"location":"references/service/#meganno_client.service.Service.get_base_payload","title":"get_base_payload()
","text":"Get the base payload for any REST request which includes the authentication token.
"},{"location":"references/service/#meganno_client.service.Service.get_schemas","title":"get_schemas()
","text":"Get schema object for the connected project.
"},{"location":"references/service/#meganno_client.service.Service.get_statistics","title":"get_statistics()
","text":"Get the statistics object for the project which supports calculations in the management dashboard.
"},{"location":"references/service/#meganno_client.service.Service.get_users_by_uids","title":"get_users_by_uids(uids: list = [])
","text":"Get user names by their unique IDs.
Parameters:
Name Type Description Defaultuids
list
list of unique user IDs.
[]
"},{"location":"references/service/#meganno_client.service.Service.get_annotator","title":"get_annotator()
","text":"Get annotator's own name and user ID. The back-end service distinguishes annotator by the token or auth object used to initialize the connection.
"},{"location":"references/service/#meganno_client.service.Service.search","title":"search(limit=DEFAULT_LIST_LIMIT, skip=0, uuid_list=None, keyword=None, regex=None, record_metadata_condition=None, annotator_list=None, label_condition=None, label_metadata_condition=None, verification_condition=None)
","text":"Search the back-end database based on user-provided predicates.
Parameters:
Name Type Description Defaultlimit
The limit of returned records in the subest.
DEFAULT_LIST_LIMIT
skip
skip index of returned subset (excluding the first skip
rows from the raw results ordered by importing order).
0
uuid_list
list of record uuids to filter on
None
keyword
Term for exact keyword searches.
None
regex
Term for regular expression searches.
None
record_metadata_condition
{\"name\": # name of the record-level metadata to filter on \"opeartor\": \"==\"|\"<\"|\">\"|\"<=\"|\">=\"|\"exists\", \"value\": # value to complete the expression}
None
annotator_list
list of annotator names to filter on
None
label_condition
Label condition of the annotation. {\"name\": # name of the label to filter on \"opeartor\": \"==\"|\"<\"|\">\"|\"<=\"|\">=\"|\"exists\"|\"conflicts\", \"value\": # value to complete the expression}
None
label_metadata_condition
Label metadata condition of the annotation. Note this can be on different labels than label_condition {\"label_name\": # name of the associated label \"name\": # name of the label-level metadata to filter on \"operator\": \"==\"|\"<\"|\">\"|\"<=\"|\">=\"|\"exists\", \"value\": # value to complete the expression}
None
verification_condition
verification condition of the annotation. {\"label_name\": # name of the associated label \"search_mode\":\"ALL\"|\"UNVERIFIED\"|\"VERIFIED\"}
None
Returns:
Name Type Descriptionsubset
Subset
Subset meeting the search conditions.
"},{"location":"references/service/#meganno_client.service.Service.deprecate_submit_annotations","title":"deprecate_submit_annotations(subset=None, uuid_list=[])
","text":"Submit annotations for records in a subset to the back-end service database. Results are filtered to only include annotations owned by the authenticated annotator.
Parameters:
Name Type Description Defaultsubset
Subset
The subset object containing records and annotations.
None
uuid_list
list
Additional filter. Only subset records whose uuid are in this list will be submitted.
[]
"},{"location":"references/service/#meganno_client.service.Service.submit_annotations","title":"submit_annotations(subset=None, uuid_list=[])
","text":"Submit annotations for a batch of records in a subset to the back-end service database. Results are filtered to only include annotations owned by the authenticated annotator.
Parameters:
Name Type Description Defaultsubset
Subset
The subset object containing records and annotations.
None
uuid_list
list
Additional filter. Only subset records whose uuid are in this list will be submitted.
[]
"},{"location":"references/service/#meganno_client.service.Service.import_data_url","title":"import_data_url(url='', file_type=None, column_mapping={})
","text":"Import data from a public url, currently only supporting csv files. Each row corresponds to a data record. The file needs at least two columns: one with a unique id for each row, and one with the raw data content.
Parameters:
Name Type Description Defaulturl
str
Public url for csv file
''
file_type
str
Currently only supporting type 'CSV'
None
column_mapping
dict
Dictionary with fields id
specifying id column name, and content
specifying content column name. For example, with a csv file with two columns index
and tweet
:
{\n \"id\": \"index\",\n \"content\": \"tweet\"\n}\n
{}
"},{"location":"references/service/#meganno_client.service.Service.import_data_df","title":"import_data_df(df, column_mapping={})
","text":"Import data from a pandas DataFrame. Each row corresponds to a data record. The dataframe needs at least two columns: one with a unique id for each row, and one with the raw data content.
Parameters:
Name Type Description Defaultdf
DataFrame
Qualifying dataframe
requiredcolumn_mapping
dict
Dictionary with fields id
specifying id column name, and content
specifying content column name. Using a dataframe, users can import metadata at the same time. For example, with a csv file with two columns index
and tweet
, and a column location
:
{\n \"id\": \"index\",\n \"content\": \"tweet\",\n \"metadata\": \"location\"\n}\n
metadata with name location
will be created for all imported data records. {}
"},{"location":"references/service/#meganno_client.service.Service.export","title":"export()
","text":"Exporting function.
Returns:
Name Type Descriptionexport_df
DataFrame
A pandas dataframe with columns 'data_id', 'content', 'annotator', 'label_name', 'label_value'
for all records in the project
set_metadata(meta_name, func, batch_size=500)
","text":"Set metadata for all records in the back-end database, based on user-defined function for metadata calculation.
Parameters:
Name Type Description Defaultmeta_name
str
Name of the metadata. Will be used to identify and query the metadata.
requiredfunc
function(raw_content)
Function which takes input the raw data content and returns the corresponding metadata (int, string, vectors...).
requiredbatch_size
int
Batch size for back-end database updates.
500
Example from sentence_transformers import SentenceTransformer\n\nmodel = SentenceTransformer('all-MiniLM-L6-v2')\n# set metadata generation function for service object demo\ndemo.set_metadata(\"bert-embedding\",\n lambda x: list(model.encode(x).astype(float)), 500)\n
"},{"location":"references/service/#meganno_client.service.Service.get_assignment","title":"get_assignment(annotator=None, latest_only=False)
","text":"Get workload assignment for annotator.
Parameters:
Name Type Description Defaultannotator
str
User ID to query. If set to None, use ID of auth token holder.
None
latest_only
bool
If true, return only the last assignment for the user. Else, return the set of all assigned records.
False
"},{"location":"references/statistic/","title":"Statistic","text":""},{"location":"references/statistic/#meganno_client.statistic.Statistic","title":"meganno_client.statistic.Statistic
","text":"The Statistic class contains methods to show basic statistics of the labeling project. Mostly used to back views in the monitoring dashboard.
Attributes:
Name Type Description__service
Service
Service object for the connected project.
"},{"location":"references/statistic/#meganno_client.statistic.Statistic.get_label_progress","title":"get_label_progress()
","text":"Get the overall progress of annotation.
Returns:
Name Type Descriptionresponse
dict
A dictionary with fields total
showing total number for data records, and annotated
showing number of records with any label from at least one annotator.
get_label_distributions(label_name: str = None)
","text":"Get the class distribution of a selected label. If multiple annotators labeled the same record, aggregate using majority vote
.
Parameters:
Name Type Description Defaultlabel_name
str
Name of label as specified in the schema.
None
Returns:
Name Type Descriptionresponse
dict
A dictionary showing aggregated class frequencies. Example: {'neg': 60, 'neu': 14, 'pos': 27, 'tied_annotations': 3}
. tied_annotation
counts numbers of record when there's more than majority voted classes.
get_annotator_contributions()
","text":"Get contributions of annotators in terms of records labeled.
Returns:
Name Type Descriptionresponse
dict
A dictionary where keys are annotator IDs and values are total numbers of annotated records by each annotator.
"},{"location":"references/statistic/#meganno_client.statistic.Statistic.get_annotator_agreements","title":"get_annotator_agreements(label_name: str = None)
","text":"Get pairwise agreement score between all contributing annotators to the project, on the specified label. The default agreement calculation method is cohen_kappa
.
Parameters:
Name Type Description Defaultlabel_name
str
Name of label as specified in the schema.
None
Returns:
Name Type Descriptionresponse
dict
A dictionary where keys are pairs of annotator IDs, and values are their agreement scores. The higher the scores are, the more frequent the pairs of annotators agree.
"},{"location":"references/statistic/#meganno_client.statistic.Statistic.get_embeddings","title":"get_embeddings(label_name: str = None, embed_type: str = None)
","text":"Return 2-dimensional TSNE projection of the text embedding for data records, together with their aggregated labels (using majority votes). Used for projection view in the monitoring dashboard.
Parameters:
Name Type Description Defaultlabel_name
str
Name of label as specified in the schema.
None
embed_type
str
the meta_name for the specified embedding
None
Returns:
Name Type Descriptionresponse
dict
A dictionary with fields agg_label
showing aggregated class label, x_axis
and y_axis
showing projected 2d coordinates.
meganno_client.subset.Subset
","text":"The Subset class is used to represent a group of data records
Attributes:
Name Type Description__data_uuids
list
List of unique identifiers of data records in the subset.
__service
Service
Connected backend service
__my_annotation_list
list
Local cache of the record and annotation view of the subset owned by service.annotator_id. with all possible metadata.
"},{"location":"references/subset/#meganno_client.subset.Subset.__init__","title":"__init__(service, data_uuids=[], job_id=None)
","text":"Init function
Parameters:
Name Type Description Defaultservice
Service
Service-class object identifying the connected backend service and corresponding data storage
requireddata_uuids
list
List of data uuid's to be included in the subset
[]
"},{"location":"references/subset/#meganno_client.subset.Subset.get_uuid_list","title":"get_uuid_list()
","text":"Get list of unique identifiers for all records in the subset.
Returns:
Name Type Description__data_uuids
list
List of data uuids included in Subset
"},{"location":"references/subset/#meganno_client.subset.Subset.value","title":"value(annotator_list: list = None)
","text":"Check for cached data and annotations of service owner, or retrieve for other annotators (not cached).
Parameters:
Name Type Description Defaultannotator_list
list
if None, retrieve cached own annotator. else, fetch live annotation from others.
None
Returns:
Name Type Descriptionsubset_annotation_list
list
See __get_annotation_list
for description and example.
get_annotation_by_uuid(uuid)
","text":"Return the annotation for a particular data record (specified by uuid)
Parameters:
Name Type Description Defaultuuid
str
the uuid for the data record specified by user
requiredReturns:
Name Type Descriptionannotation
dict
Annotation for specified data record if it exists else None
"},{"location":"references/subset/#meganno_client.subset.Subset.show","title":"show(config={})
","text":"Visualize the current subset in an in-notebook annotation widget.
Development note: initializing an Annotation widget, creating unique reference to the associated subset and service.
Parameters:
Name Type Description Defaultconfig
dict
Configuration for default view of the widget.
- view : \"single\" | \"table\", default \"single\"\n- mode : \"annotating\" | \"reconciling\", default \"annotating\"\n- title: default \"Annotation\"\n- height: default 300 (pixels)\n
{}
"},{"location":"references/subset/#meganno_client.subset.Subset.set_annotations","title":"set_annotations(uuid=None, labels=None)
","text":"Set the annotation for a particular data record with the specified label
Parameters:
Name Type Description Defaultuuid
str
the uuid for the data record specified by user
None
labels
dict
The labels for the data record at record and span level, with the following structure:
- \"labels_record\" : list\n A list of record-level labels\n- \"labels_span\" : list\n A list of span-level labels\n\nExamples\n-------\n\nExample of setting an annotation with the desired record and span level labels:\n```json\n{\n \"labels_record\": [\n {\n \"label_name\": \"sentiment\",\n \"label_value\": [\"neu\"]\n }\n ],\n\n \"labels_span\": [\n {\n \"label_name\": \"sentiment\",\n \"label_value\": [\"neu\"],\n \"start_idx\": 10,\n \"end_idx\": 20\n }\n ]\n}\n```\n
None
Raises:
Type DescriptionException
If uuid or labels is None
Returns:
Name Type Descriptionlabels
dict
Updated labels for uuid annotated by user
"},{"location":"references/subset/#meganno_client.subset.Subset.get_reconciliation_data","title":"get_reconciliation_data(uuid_list=None)
","text":"Return the list of reconciliation data for all data entries specified by user. The reconciliation data for one data record consists of the annotations for it by all annotators
Parameters:
Name Type Description Defaultuuid_list
list
list of uuid's provided by user. If None, use all records in the subset
None
Returns:
Name Type Descriptionreconciliation_data_list
list
List of reconciliation data for each uuid with the following keys: annotation_list
which specifies all the annotations for the uuid, data
which contains the raw data specified by the uuid, metadata
which stores additional information about the data, tokens
, and the uuid
of the data record Full Example:
{\n \"annotation_list\": [\n {\n \"annotator\": \"pwOA1N9RKZVJM8VZZ7w8VcT8lp22\",\n \"labels_record\": [],\n \"labels_span\": []\n },\n {\n \"annotator\": \"IAzgHOxyeLQBi5QVo7dQR0p2DpA2\",\n \"labels_record\": [\n {\n \"label_name\": \"sentiment\",\n \"label_value\": [\"pos\"]\n }\n ],\n \"labels_span\": []\n }\n ],\n \"data\": \"@united obviously\",\n \"metadata\": [],\n \"tokens\": [],\n \"uuid\": \"ee408271-df5d-435c-af25-72df58a21bfe\"\n}\n
"},{"location":"references/subset/#meganno_client.subset.Subset.suggest_similar","title":"suggest_similar(record_meta_name, limit=3)
","text":"For each data record in the subset, suggest more similar data records by retriving the most similar data records from the pool, based on metadata(e.g., embedding) distance.
Parameters:
Name Type Description Defaultrecord_meta_name
str
The meta-name eg. \"bert-embedding\" for which the similarity is calculated upon.
requiredlimit
int
The number of matching/similar records desired to be returned. Default is 3
3
Raises:
Type DescriptionException
If response code is not successful
Returns:
Name Type Descriptionsubset
Subset
A subset of similar data entries
"},{"location":"references/subset/#meganno_client.subset.Subset.assign","title":"assign(annotator)
","text":"Assign the current subset as payload to an annotator.
Parameters:
Name Type Description Defaultannotator
str
Annotator ID.
required"}]} \ No newline at end of file diff --git a/1.5.3/sitemap.xml.gz b/1.5.3/sitemap.xml.gz index 4dd016f..17ba6f6 100644 Binary files a/1.5.3/sitemap.xml.gz and b/1.5.3/sitemap.xml.gz differ