Gremlin crowdsource #121

sara-02 · 2017-11-13T13:06:49Z

Depends on Apollo module #118 to be merged first.
Depends on Python3 compatible code #45 to be merged first

sara-02 · 2017-11-13T13:09:46Z

For user story openshiftio/openshift.io#1286

centos-ci · 2017-11-28T16:50:35Z

@sara-02 Your image is available in the registry: docker pull registry.devshift.net/bayesian/kronos:SNAPSHOT-PR-121

centos-ci · 2017-11-28T18:32:18Z

@sara-02 Your image is available in the registry: docker pull registry.devshift.net/bayesian/kronos:SNAPSHOT-PR-121

miteshvp

Some minor nitpick, but LGTM otherwise

miteshvp · 2017-11-29T06:53:40Z

analytics_platform/kronos/apollo/src/apollo_tag_prune.py

-                                     input_package_topic_data_store,
-                                     output_package_topic_data_store,
-                                     additional_path)
+            untagged_pakcage_data = TagListPruner.clean_file(package_file_name,


typo. You may want to fix and all subsequent occurrences.

miteshvp · 2017-11-29T06:55:30Z

analytics_platform/kronos/apollo/src/apollo_tag_prune.py

        result_package_topic_json = []
+        untagged_pakcage_data = {}


miteshvp · 2017-11-29T06:56:51Z

analytics_platform/kronos/apollo/src/apollo_tag_prune.py

-        # TODO: use singleton object, with updated package_topic_list
+            if ecosystem in untagged_pakcage_data.keys():
+                current_untagged_set = set(untagged_pakcage_data[ecosystem])
+                new_untagged_set = current_untagged_list.union(


did you intend to use current_untagged_set instead of the list?

centos-ci · 2017-11-29T15:06:10Z

@sara-02 Your image is available in the registry: docker pull registry.devshift.net/bayesian/kronos:SNAPSHOT-PR-121

miteshvp · 2017-11-30T04:20:36Z

LGTM. Will wait for @pkajaba approval

centos-ci · 2017-11-30T13:06:21Z

@sara-02 Your image is available in the registry: docker pull registry.devshift.net/bayesian/kronos:SNAPSHOT-PR-121

centos-ci · 2017-12-04T08:19:58Z

@sara-02 Your image is available in the registry: docker pull registry.devshift.net/bayesian/kronos:SNAPSHOT-PR-121

pkajaba · 2017-12-04T08:21:54Z

@sara-02 Please, rebase instead of merge commit.

pkajaba · 2017-12-04T10:42:57Z

analytics_platform/kronos/src/config.py.template

@@ -12,6 +12,8 @@ AWS_BUCKET_NAME = os.environ.get("AWS_BUCKET_NAME","dev-stack-analysis-clean-dat
 KRONOS_SCORING_REGION = os.environ.get("KRONOS_SCORING_REGION", "")
 KRONOS_MODEL_PATH = os.environ.get("KRONOS_MODEL_PATH", KRONOS_SCORING_REGION + "/github/")
 DEPLOYMENT_PREFIX = os.environ.get("DEPLOYMENT_PREFIX", "")
-
+GREMLIN_REST_URL = "http://{host}:{port}".format(


I have one question not strictly related to this PR, but why do you have template instead of just having config.py?

So that by mistake the credential don't get committed. if someone changes their config.py Config.py is in .gitignore.

Oh, I can see your point now, but you are sourcing all those files from environment variables.

Not always, sometimes directly writing them in the con fig, it is easier that way, as they don't change over the testing period.

I would go with a way, where every developer would have a script where those secrets are stored. This script would not be in repo, might be in gitignore. This script would basically export stored secrets:

#!/bin/bash export TOP_SECRET1="foo_bar" export TOP_SECRET2="foo_bar" . . . export TOP_SECRET_N="foo_bar" ./run_actuall_code.py

It's another extra script, but I find it clearer than copying configs every time.

@sara-02 Can you elaborate more?

@sara-02 ^^

How will it get the configs inside the Docker file?
https://github.com/fabric8-analytics/fabric8-analytics-stack-analysis/blob/master/Dockerfile#L24

through environment variables.

Ok will add a PR for that separately.

pkajaba · 2017-12-04T11:05:20Z

analytics_platform/kronos/apollo/src/apollo_tag_prune.py

        """Generate the clean aggregated package_topic list as required by Gnosis.

        :param input_package_topic_data_store: The Data store to pick the package_topic files from.
        :param output_package_topic_data_store: The Data store to save the clean package_topic to.
        :param additional_path: The directory to pick the package_topic files from."""

+        if mode == "test":


Why do you need this mode in the first place? You should rename it to data_path and you don't have to have this if.

Because the value of data_path is different when running the test cases, that is why mode is needed.

and my point here is that you don't need to solve this through conditions. You can set this value for once This will work because you are not running tests and real code in the same instance, are you?

So basically pass the APOLLO_PATH instead of the mode value?

pretty much, or you can have it as a class variable for example.

@pkajaba Ack, thank have updated using a temp_path variable instead of mode setting.

centos-ci · 2017-12-05T15:09:06Z

@sara-02 Your image is available in the registry: docker pull registry.devshift.net/bayesian/kronos:SNAPSHOT-PR-121

centos-ci · 2017-12-07T09:26:51Z

@sara-02 Your image is available in the registry: docker pull registry.devshift.net/bayesian/kronos:SNAPSHOT-PR-121

rootAvish · 2017-12-07T12:33:01Z

util/data_store/local_filesystem.py

    def __init__(self, src_dir):
        self.src_dir = src_dir
        # ensure path ends with a forward slash
-        self.src_dir = self.src_dir if self.src_dir.endswith("/") else self.src_dir + "/"
+        self.src_dir = self.src_dir if self.src_dir.endswith(


You don't need this if you use os.path.join everywhere src_dir is used.

Currently removing this will cause other tests cases to fail, this requires a module wide fix of using os.path.join. I will create an issue and work on PR to fix this for all files.

I already fixed it for all files that were present in the source code when I did my Python3 changes.

The "apollo" module should be the only problem, ergo not too hard to fix.

rootAvish · 2017-12-07T12:35:13Z

tests/unit_tests/test_apollo_tag_and_gremlin.py

+    def test_gremlin_updater_generate_payload(self):
+        expected_pay_load = {
+            'gremlin':
+            "g.V().has('ecosystem', 'ruby')." +


You don't need the + sign here if you use the magic string concatenation of Python.

pkajaba · 2017-12-12T11:38:18Z

@sara-02 would you kindly rebase? :-)

pkajaba · 2017-12-12T11:40:44Z

@sara-02 It's looking good to me, but I would appreciate some unit test for new functions.

Have added them here, Please leave your review comments on this file.

centos-ci · 2017-12-13T10:51:22Z

@sara-02 Your image is available in the registry: docker pull registry.devshift.net/bayesian/kronos:SNAPSHOT-PR-121

sara-02 · 2017-12-13T13:01:43Z

@pkajaba PTAL again.

pkajaba

@sara-02 would you kindly take a look?

pkajaba · 2017-12-14T11:15:03Z

analytics_platform/kronos/apollo/src/apollo_gremlin.py

+        file_list = local_data_obj.list_files()
+        for file_name in file_list:
+            data = local_data_obj.read_json_file(file_name)
+            # TODO: use a singleton object with updated datafile.


Can you implement this as a singleton? Anyway, I don't really like that you are initializing the instance of the object inside of the same class as an initalized object is.

Is it some design pattern or what is the reason behind it?

pkajaba · 2017-12-14T11:18:53Z

analytics_platform/kronos/apollo/src/apollo_gremlin.py

+        for each_ecosystem in self.untagged_data:
+            package_list = self.untagged_data[each_ecosystem]
+            pck_len = len(package_list)
+            if pck_len == 0:


I would delete this if because it not required in context for this method.

pkajaba · 2017-12-14T11:19:40Z

analytics_platform/kronos/apollo/src/apollo_gremlin.py

+                # If pck_len =0 then, no package of that ecosystem requires
+                # tags. Hence, do nothing.
+                continue
+            for index in range(0, pck_len, 100):


is 100 really value what you want to have here? it means that just every 100th element in range will be used.

Yes, this can be passed as parameter, but that is what is intended, as we want to break the list in chunks of 100 subsets each.

pkajaba · 2017-12-14T11:20:40Z

analytics_platform/kronos/apollo/src/apollo_gremlin.py

+                continue
+            for index in range(0, pck_len, 100):
+                sub_package_list = package_list[index:index + 100]
+                pay_load = self.generate_payload(


why do you have underscore here?

I don't see any dangling underscore :/

a payload is the correct word, right?

pkajaba · 2017-12-14T11:21:48Z

analytics_platform/kronos/apollo/src/apollo_gremlin.py

+                    each_ecosystem, sub_package_list)
+                self.execute_gremlin_dsl(pay_load)
+
+    def generate_payload(self, ecosystem, package_list):


this method can be class/static.

pkajaba · 2017-12-14T11:35:52Z

analytics_platform/kronos/apollo/src/apollo_gremlin.py

+        self.untagged_data = untagged_data
+
+    @classmethod
+    def generate_and_update_packages(cls, apollo_temp_path):


You have some tests but you are not testing this method. Any specific reason?

@pkajaba This method talks to graph, so every time we are testing it we need to a working instance of grmelin-http up and running.

so I would advise here to mock HTTP response to emulate the behavior of graph DB.

pkajaba · 2017-12-14T11:36:17Z

analytics_platform/kronos/apollo/src/apollo_gremlin.py

+            graph_obj.update_graph()
+            local_data_obj.remove_json_file(file_name)
+
+    def update_graph(self):


The same comment about testing applies here.

Same reason.

pkajaba · 2017-12-14T11:38:32Z

tests/unit_tests/test_apollo_tag_and_gremlin.py

+                'str_packages': ['service_identity']}}
+
+        unknown_data_obj = LocalFileSystem(APOLLO_TEMP_TEST_DATA)
+        self.assertTrue(unknown_data_obj is not None)


assert unknown_data_obj would be enough here.

pkajaba · 2017-12-14T11:40:47Z

tests/unit_tests/test_apollo_tag_and_gremlin.py

+# IMPORTANT: TestGraphUpdater needs to run after TestTagListPruner
+
+    # Test class TestGraphUpdater(TestCase):
+    def test_gremlin_updater_generate_payload(self):


what does this method is supposed to test? you don't really have read package list from the file system, just create a fixture of package list and tests whether a query is created correctly.

Ack, I don't need to load the list, but generation of list needs to be checked, so I am adding it to the previous test instead of this one.

yeah, you can spit it, but if a generation of the list really has to be tested it should be extracted in function.

@sara-02 ^^

@sara-02 ^^ :-)

I am testing extraction here itself and then deleting the files. If not then it will create a dependency of tests as we have make sure that the extraction always gets tested after generation if they are 2 separate tests.

This function only checks for payload now.

pkajaba · 2017-12-14T11:44:48Z

tests/unit_tests/test_apollo_tag_and_gremlin.py

+class TestPruneAndUpdate(TestCase):
+
+    # Test Class TagListPruner
+    def test_generate_and_save_pruned_list_local(self):


I am struggling to see what method is this test testing. Can you elaborate?

The input list contains more than 4 tags, the prune method will generate tag list upto 4 tags based on frequency. So this test checks that the desired 4 tags are generated or not.

but which method does it test in code? I can't find test_generate_and_save_pruned_list_local method in repository.

Unit tests should test methods and their behavior on various inputs.

@sara-02 ^^

centos-ci · 2017-12-15T09:47:21Z

@sara-02 Your image is available in the registry: docker pull registry.devshift.net/bayesian/kronos:SNAPSHOT-PR-121

sara-02 · 2017-12-15T09:49:43Z

@pkajaba PATL, i think all major concerns have been addressed. 2 things that need a separate PR include the env.sh for the repo and the use of os.path.join where not in use.

sara-02 changed the title ~~Gremlin crowdsource~~ [WIP]: Gremlin crowdsource Nov 13, 2017

sara-02 requested a review from pkajaba November 28, 2017 16:48

sara-02 changed the title ~~[WIP]: Gremlin crowdsource~~ Gremlin crowdsource Nov 28, 2017

sara-02 requested a review from miteshvp November 28, 2017 18:49

miteshvp approved these changes Nov 29, 2017

View reviewed changes

pkajaba reviewed Dec 4, 2017

View reviewed changes

rootAvish reviewed Dec 7, 2017

View reviewed changes

sara-02 added 10 commits December 13, 2017 16:14

Add code for pruning taglist in apollo module.

72bbc64

Add code and test for crowdsouring backend.

525fc0e

Fix mode test

f1e6494

Add Gremlin endpoint url

b63511c

Rebase and gremlin query update

35f76ed

linting fixes and template update

debf81e

Fix package name typo and add local gremlin requirement

e76b881

Python3 compatible

8b4fbec

Remove mode and add apollo temp path

33b7844

add requests module

6725763

Update string concatination

7c5618d

pkajaba reviewed Dec 14, 2017

View reviewed changes

Update singlenton logic payload name

8a614a5

sara-02 mentioned this pull request Jan 11, 2018

Add template for local deployment #173

Closed

Gremlin crowdsource #121

Are you sure you want to change the base?

Gremlin crowdsource #121

Conversation

sara-02 commented Nov 13, 2017 • edited by pkajaba Loading

sara-02 commented Nov 13, 2017

centos-ci commented Nov 28, 2017

centos-ci commented Nov 28, 2017

miteshvp left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

centos-ci commented Nov 29, 2017

miteshvp commented Nov 30, 2017

centos-ci commented Nov 30, 2017

centos-ci commented Dec 4, 2017

pkajaba commented Dec 4, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pkajaba Dec 6, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

centos-ci commented Dec 5, 2017

centos-ci commented Dec 7, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pkajaba commented Dec 12, 2017

pkajaba commented Dec 12, 2017 • edited by sara-02 Loading

centos-ci commented Dec 13, 2017

sara-02 commented Dec 13, 2017

pkajaba left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sara-02 Dec 15, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pkajaba Dec 15, 2017 • edited Loading

sara-02 commented Nov 13, 2017 •

edited by pkajaba

Loading

pkajaba Dec 6, 2017 •

edited

Loading

pkajaba commented Dec 12, 2017 •

edited by sara-02

Loading

sara-02 Dec 15, 2017 •

edited

Loading

pkajaba Dec 15, 2017 •

edited

Loading